-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Aggregated API] Using the API for both low latency reactive monitoring and detailed client reporting #732
Comments
cc @hostirosti @ruclohani for visibility. Thanks for filing, @alois-bissuel . I agree a more flexible way of consuming privacy budget should help satisfy the use case. Between use-case (1) and (2) you mention, do you expect the keys to be similar or will e.g. (2) query finer grained slices? |
Hello, It would be super useful for us if it would be possible to increase the limit from 1 to 2. i.e. that the same aggregatable report could appear in two batches and hence contribute to two summary report.
For our use case, we expect the keys to be similar. |
Hi, |
@alois-bissuel quick clarification, you said:
Is this true? If we split the key space into a few sections and allowed you to query those sections independently, you still need to allocate separate budget across those key spaces, just in the form of the client's L1 contribution bound rather than an epsilon. Is this because the two use-cases will actually have different data / keys, and so querying the "high latency / detailed" key space during a low latency query is wasteful? |
Catching up on my issues, sorry for the delay:
As there will be less data available for use-case (1) (ie the fast-pace aggregation case), I expect us to encode less things in the key to have more data aggregated per key.
Indeed, I was not clear there. I was thinking of separate encoding for both use case and thus a finer budget tracking. Of course the L1 budget still applies. I guess that my first proposal (ie allocating an epsilon budget per pass) rules out a different encoding per use case, hence my remark (and your final comment). |
FYI, the Aggregation Service team is currently looking into supporting requerying, which could help with this use case. If you're interested, please take a look at privacysandbox/aggregation-service#71. |
Hi, |
Hello,
There are some cases where we want to use the aggregated API for two uses cases which are quite different:
We struggle to articulate the two use cases within the API in its current form. Because the data can be only processed once, we have to sacrifice one of the use cases (ie either use one detailed encoding and process the data hourly, meaning use case 2. gets drowned by noise, or process data daily and sacrifice use case 1.).
Supporting the two use cases at the same time could be done by allowing several passes of the data in the aggregation service. To keep the differential privacy properties of the aggregation service, we could keep track of the already consumed budget (i.e. the first pass used ε/4, the second ε/2, and the last ε/4). Another approach would be to define broad key spaces (e.g. split the 128 bit space in 4 buckets), and allow the aggregation only once per key space. This way one would encode in the first key space the fast-paced campaign monitoring metrics and query the aggregation service hourly for them, and encode the client reporting metrics elsewhere and aggregate them weekly.
Both methods have their pros and cons, the latter being more precise (as one doesn't burn some of my budget for the two use cases at the same time), and the former enabling to have less regret (ie on can always reserve some budget for a last aggregation in case of a mistake).
For both methods, the storage space of the aggregation service. can be controlled by setting a sensible but low limit on the number of times the data can be processed.
The text was updated successfully, but these errors were encountered: