I’m not pretending to understand how homomorphic encryption works or how it fits into this system, but here’s something from the article.
With some server optimization metadata and the help of Apple’s private nearest neighbor search (PNNS), the relevant Apple server shard receives a homomorphically-encrypted embedding from the device, and performs the aforementioned encrypted computations on that data to find a landmark match from a database and return the result to the client device without providing identifying information to Apple nor its OHTTP partner Cloudflare.
There’s a more technical write up here. It appears the final match is happening on device, not on the server.
The client decrypts the reply to its PNNS query, which may contain multiple candidate landmarks. A specialized, lightweight on-device reranking model then predicts the best candidate by using high-level multimodal feature descriptors, including visual similarity scores; locally stored geo-signals; popularity; and index coverage of landmarks (to debias candidate overweighting). When the model has identified the match, the photo’s local metadata is updated with the landmark label, and the user can easily find the photo when searching their device for the landmark’s name.
That’s really cool (not the auto opt-in thing). If I understand correctly, that system looks like it offers pretty strong theoretical privacy guarantees (assuming their closed-source client software works as they say, with sending fake queries and all that for differential privacy). If the backend doesn’t work like they say, they could infer what landmark is in an image when finding the approximate minimum distance to embeddings in their DB, but with the fake queries they can’t be sure which one is real. They can’t see the actual image either way as long as the “128-bit post-quantum” encryption algorithm doesn’t have any vulnerabilies (and the closed source software works as described).
I’m not pretending to understand how homomorphic encryption works or how it fits into this system, but here’s something from the article.
There’s a more technical write up here. It appears the final match is happening on device, not on the server.
That’s really cool (not the auto opt-in thing). If I understand correctly, that system looks like it offers pretty strong theoretical privacy guarantees (assuming their closed-source client software works as they say, with sending fake queries and all that for differential privacy). If the backend doesn’t work like they say, they could infer what landmark is in an image when finding the approximate minimum distance to embeddings in their DB, but with the fake queries they can’t be sure which one is real. They can’t see the actual image either way as long as the “128-bit post-quantum” encryption algorithm doesn’t have any vulnerabilies (and the closed source software works as described).