Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify which aggregate computations are supported #116

Closed
csharrison opened this issue Mar 8, 2021 · 2 comments
Closed

Specify which aggregate computations are supported #116

csharrison opened this issue Mar 8, 2021 · 2 comments

Comments

@csharrison
Copy link
Collaborator

In the meeting on 03-08-2021 we went over some example computations the aggregate API could support (slides) that satisfy differential privacy. These included:

  • Fixed domain vector aggregation (e.g. 1M aggregation keys)
  • Hierarchical domains (possibly with multiple queries), to prune a larger domain smaller in some flexible way. Thresholding can be used to make the computations more efficient, though may not be strictly required by DP.
  • "Sparse vector" techniques to handle truly massive domains (e.g. 2^64 or 2^128 entries from hashing a string), which requires thresholding to preserve DP, but will never report on a key that wasn't present (see this doc)
    • Example MPC: something like what we documented in private_histograms_mpc.md, although more work is needed to evaluate these techniques.

These techniques have different pros and cons (and these techniques are obviously not exhaustive). I'm filing this issue to solicit more feedback. Some evaluation criteria:

  • Developer ergonomics (especially with regard to figuring out a dense encoding of aggregation keys)
  • Utility of output (e.g. bias introduced by thresholding)
  • MPC simplicity
  • MPC security guarantees (zero-knowledge, etc.)
  • MPC computation / communication costs
  • Privacy of output (e.g. smaller domain sizes can encode less information about users)
@csharrison
Copy link
Collaborator Author

Update here: Google has open sourced a C++ implementation of the distributed point functions functionality. It can be found here:
https://github.com/google/distributed_point_functions

@csharrison
Copy link
Collaborator Author

Closing, for now. The choice we made currently is specified in https://github.com/WICG/attribution-reporting-api/blob/main/AGGREGATION_SERVICE_TEE.md#pre-declaring-aggregation-buckets

However, there is an open issue to consider other mechanisms (#583)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant