Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

forDebuggingOnly availability #632

Open
jonasz opened this issue Jun 16, 2023 · 60 comments
Open

forDebuggingOnly availability #632

jonasz opened this issue Jun 16, 2023 · 60 comments

Comments

@jonasz
Copy link
Contributor

jonasz commented Jun 16, 2023

Hi,

I was wondering, what is the plan for the forDebuggingOnly reporting functions and their availability? Will they be supported during the mode a and mode b testing phases?

Best regards,
Jonasz

@ajvelasquezgoog
Copy link
Collaborator

Hi @jonasz, yes they are supported in both A and B modes.
forDebuggingOnly for loss reporting, the plan is to retire it by 3PCD
forDebuggingOnly for win reporting, the plan is to keep it around until at least 2026

@ajvelasquezgoog
Copy link
Collaborator

@jonasz we are actually thinking more about the privacy risks of the two parts of the forDebugOnly APIs, we need to think more about this, let us get back to you soon, hopefully next week

@fhoering
Copy link
Contributor

Do you think some sampled mode could be acceptable in the long term ? Something small enough that it doesn't allow to do any user identification, like 1% of forDebuggingOnly.reportWin & forDebuggingOnly.reportLoss ?
Those flags are extremely useful to be able to debug client side code and allow e2e testing the full pipeline at event level. Not sure this could easily be reproduced with aggregated reporting of technical errors.

@jonasz
Copy link
Contributor Author

jonasz commented Sep 19, 2023

@ajvelasquezgoog friendly ping, any updates on this issue?

@ajvelasquezgoog
Copy link
Collaborator

ajvelasquezgoog commented Sep 29, 2023

We thank everyone interested for your patience on getting updates on this matter. We have been working closely with and collecting feedback from stakeholders over the last several weeks, and examined the efforts required to adapt to the full removal of these functions by the 3PCD deadline.

The intent of forDebuggingOnly has always been since inception to be used for troubleshooting purposes as adtech builds and tests their integration with the Protected Audience API, with our current plan of record indicating that these functions will be deprecated at 3PCD.

The incremental feedback that we have been receiving in the last few months on this plan can be summarized as follows:

  • Net new onboarding by adtech to our APIs will certainly continue after 3PCD, as more and more adtech solidify their integrations plans, even if they cannot get to them at exactly 3PCD.
  • Once an integration is live, there are continued needs to have debugging abilities in the situations where operational monitoring and alerting that adtech has in place flags a potential production issue that has a (partial) dependency on the Protected Audience API. While the browser provides a comprehensive set of tooling to allow for bug reproduction and analysis, it remains a critical need for adtech to be able to access samples of debugging information to faster and more confidently do root cause analysis in escalation situations.
  • A variation of this last use case that we have heard of that also depends on access to samples of this debugging information is in what we can best describe as proactive quality control audits, in which samples are taken that include inputs and outputs of the different worklets and evaluated/compared against internal benchmarks done in isolation in a laboratory environment

Given that, we think there is a path to continue supporting these use cases with a certain level of fidelity that will be acceptable, that also continues to meet our privacy goals. In essence we think it is possible to keep the forDebuggingOnly API and its signature as-is while significantly downsampling how often does a call to forDebuggingOnly actually fetch the report destination URL and significantly reduce the risk of bad actors establishing cross-site linkages or re-identifying users.

In essence the proposal entails the introduction of 3 Chrome-controlled variables that will modify the current behavior of the forDebuggingOnly.reportAdAuctionWin() and forDebuggingOnly.reportAdAuctionLoss() calls by 3PCD rollout. Please recall that these calls can be made by sellers in scoreAd() and for every buyer’s generateBid() function for each interest group that is participating in the auction.

New variable 1: Sampling Rate. Denotes how often a call to the forDebuggingOnly methods actually results in the report’s destination URL being fetched. Let us call this variable f, where how often the call happens is defined as 1 / f. Additionally, let’s define an outcome variable o as a binary [TRUE | FALSE], where o = TRUE one out of f times based on a randomizing function implemented by Chrome. We propose a value of 1000 for f.

New Variable 2: Cooldown Period. Denotes for how long (in days) should a single Chrome client, for a given calling adtech, return the same FALSE result for o after running the randomizing function that determines that the result should be FALSE. Let us call this variable c, and we propose a value of 365 days (1 year) for it.

New Variable 3: Lockout Period. Denotes for how long (in days) should a single Chrome client, for any and all calling adtech return a FALSE result for o after it returns TRUE for o once after running the randomizing function. Let us call this variable L, and we propose a value of 1095 days (3 years) for it.

In other words, when one ad tech calls a forDebuggingOnly method:

  • with probability 999/1000 the API will not send a report, and furthermore that browser will ignore forDebuggingOnly from the caller ad tech for the next one year.
  • with probability 1/1000 the API will send a report, at which point the browser will ignore forDebuggingOnly calls from any ad tech, for the next 3 years.

Based on these variables, and based on 2 reasonable assumptions we can make:

  • The number of Chrome clients with the PA API enabled are around 2,000,000,000
  • There being around 100 adtech companies integrated with our PA APIs in steady-state

We calculate that in legitimate scenarios like the ones detailed in the opening paragraphs of this reply that each participating adtech should be getting between ~4.7K and ~5.4K daily reports, if they choose to implement forDebuggingOnly at every generateBid and scoreAd instance. The variability is correlated to the number of participating adtech, the more participating adtech there are, the higher the likelihood that a given user is already within its lockdown period L Additionally, adtech can determine that forDebuggingOnly is only (or most) useful when certain criteria that are detectable by either generateBid or scoreAd are met. In these cases we suggest adtech to only call the forDebuggingOnly API in cases when it detects a critical situation that’s important to investigate so as to minimize the chance they are incurring a self-imposed cooldown period when calling forDebuggingOnly.

We also want to highlight the protections that we see against malicious scenarios with this approach. A malicious actor that knows that the sample rate is 1/f can choose to implement forDebuggingOnly by implementing a for…loop where i=f, with the intent of getting a report every single time they run scoreAd or generateBid. When the actor gets a report, their plan is to attach it to a unique ID they have created for this user/browser, and waits for the next, second instance of getting a report from the same user/browser to add to their profile. The cooldown parameter c protects the value of o, if it returns FALSE, from being re-evaluated for another c days, rendering the malicious intent of the for…loop meaningless. Now, if o returns a value of TRUE, then the actor has to wait 3 years before being able to get a report from the exact same user/browser. And if we layer on top the very small probability that after year 3 has elapsed the actor will get a report from the exact same user, we can see how these protections make these profile building efforts essentially unviable and untenable.

We believe that with this proposal, we can accomplish the goals we set out in our opening paragraphs. Any and all feedback is very appreciated!

@jonasz here you go

@JacobGo
Copy link
Contributor

JacobGo commented Oct 4, 2023

We are thrilled to see longterm support for these debugging APIs and look forward to the improved observability as we mature our integrations. I wanted to raise two concerns with the details of the above proposal.

In practice, we have observed some overhead when enabling debug codepaths due to the additional code profiling and report building. Given the highly latency sensitive worklet execution environment, we would recommend a mechanism to detect availability of forDebuggingOnly within the worklet to avoid wasting computation on building events that will never be sent.

Additionally, we are concerned about the shared lockout period given the threshold for critical situations may differ across adtechs. If one buyer decides to frequently invoke the API or unintentionally introduces a major bug which accelerates their call rate to 100%, should this lockout another buyer who needs to debug their own rare exceptions or sudden incidents?

@michaelkleber
Copy link
Collaborator

In practice, we have observed some overhead when enabling debug codepaths due to the additional code profiling and report building. Given the highly latency sensitive worklet execution environment, we would recommend a mechanism to detect availability of forDebuggingOnly within the worklet to avoid wasting computation on building events that will never be sent.

Hmm. There are two different things you might be asking here:

  1. "Give me a mechanism that tells me whether a debug report would actually get sent."
  2. "Give me a mechanism that tells me whether this browser would even consider sending a report right now, with the understanding that even if it would consider it, there is only a 1/1000 chance of sending it."

I think 1 would need to be an API that actually performed the die roll and triggered the cooling-off period 999/1000 of the time that 2 returned true

I'm not sold on either one of these, but which are you asking for?

Additionally, we are concerned about the shared lockout period given the threshold for critical situations may differ across adtechs. If one buyer decides to frequently invoke the API or unintentionally introduces a major bug which accelerates their call rate to 100%, should this lockout another buyer who needs to debug their own rare exceptions or sudden incidents?

If one buyer spams the API for whatever reason, the worst they could do is lock out 1/1000 of people for everyone else. The cooling-off period that happen 999/1000 times isn't shared state — it is only the 3-year lock-out that would let one ad tech affect another ad tech.

@JacobGo
Copy link
Contributor

JacobGo commented Oct 9, 2023

Thank you for the responses. Could you kindly elaborate on why triggering the cooling-off period is necessary when detecting API availability? Is the concern that we would have access to the 1/1000 device-sticky decision to truly send the debug report, and that this may influence or leak out of the internal worklet execution? I believe we're asking for (1), but without tripping the cooldown period given this (a) effectively incurs the statistical cost of always invoking the API and (b) may be a surprising side effect for all developers. I'm afraid this may incentivize us to always invoke the API if it's detected rather than save it for error states; alternatively we should just accept the overhead and restrict building the event messages to truly exceptional scenarios.

Great point about the difference between global lock-out and per-adtech cooling-off periods, I agree that the interplay of these successfully mitigates the impact of a spammy adtech.

One final note: as a user of forDebuggingOnly, I find the terminology slightly inverted here. I would expect "cooldown" to reflect the no-op state after a successful invocation of the API, whereas the lockout period may reflect the API being completely inaccessible in the first place.

@michaelkleber
Copy link
Collaborator

Thank you for the responses. Could you kindly elaborate on why triggering the cooling-off period is necessary when detecting API availability?

An API that is of the form "If I asked for a report right now, then would you send it?" would completely eliminate the 1-year cooling-off period, right? — after all, nobody would ever call the debugging API if they knew that it would not send a report. Your request would allow circumvention of all the "protections against malicious scenarios" that Alonso described above.

Or maybe I'm still misunderstanding what you're asking for?

On the other hand, I don't see any harm in an API of the form "Am I currently cooling down and/or locked out?" That would let you build your debugging requests much less often than without it, even though you would still only have a 1/1000 chance of sending each one that you built. @JensenPaul WDYT?

(Regarding "lockout" vs "cooldown", I personally feel like "lockout" feels more global, like "the door is locked", while "cooldown" seems more caller-specific, as in "you are over-heated, go take a walk and cool down and then you can come back and join the rest of us." But if other people have opinions on these or other more intuitive names for the two states, please share!)

@JacobGo
Copy link
Contributor

JacobGo commented Oct 9, 2023

Ah, I was assuming that the FALSE die roll was cast once per worklet function execution and there was no way to coordinate a loop based attack external to these functions. Thinking outside of that box, it does become clear why the check itself requires a cooldown. Any mechanisms to minimize the overhead of the API usage would still be welcome.

Overall, the statefulness of this API makes it more difficult to conceptually model an observability framework compared to traditional random sampling. I wonder if there might be issues here with a population more prone to exceptional circumstances gradually dwindling overtime due to the cooldown, as well as the true rate of an exception becoming invisible without a fully transparent sampling rate? I also worry about the longterm repercussions of an initial, overly lax threshold of exceptional events, e.g. an adtech accidentally locking themselves out of the API for a year.

@michaelkleber
Copy link
Collaborator

Thanks. I think the "Am I currently cooling down and/or locked out?" API would indeed help with minimizing the overhead, we'll explore that.

I wonder if there might be issues here with a population more prone to exceptional circumstances gradually dwindling overtime due to the cooldown, as well as the true rate of an exception becoming invisible without a fully transparent sampling rate?

I agree with this concern, but I haven't come up with any other way to preserve the privacy goals.

I also worry about the longterm repercussions of an initial, overly lax threshold of exceptional events, e.g. an adtech accidentally locking themselves out of the API for a year.

Yes, great point, that does seem like it's too easy to accidentally shoot yourself in the foot.

Instead of a 1-year cool-down when you don't get a report, I wonder if we could instead have a shorter timeout, like 1 week, that would trigger 90% of the time, and a 1-year timeout the other 10%. Then even if an ad tech shipped a bug that asked everyone to send a debug report all at once, they would recover the ability to debug on 90% of their traffic a week later. (All percentages and time durations subject to change, but at least it's an idea.)

@michaelkleber
Copy link
Collaborator

Okay, I've done a little simulating of this idea of the anti-footgun two-cooldowns idea — thank you Google Sheets for the "Iterative calculation" capability in the File>Settings>Calculation menu.

Suppose that when you ask for a debug report in a Chrome instance which is not in the cool-down or lock-out state,

  • A debug report gets sent 1/1000 of the time, and that browser enters global lock-out for 3 years
  • Otherwise, no report gets sent, and:
    • there is a 90% chance the calling ad-tech enters a 2-week cool-down
    • there is a 10% chance that the calling ad-tech enters a 1-year cool-down

Which is to say: if you accidentally push into production a bug that asks everyone in the world to send you a debug report, you would regain your ability to do selective debugging on 90% of browsers after two weeks, instead of after one year.

In that case, with 100 ad techs spamming the API as much as possible, each one gets around 6500 debug reports per day per billion Chrome instances.

If there were only a single ad tech using the API, they would instead get around 20K reports per day per billion, so the global lock-out mechanism cuts the number of reports to about 1/3 of what it would be otherwise. The truth will probably be somewhere between those two extremes.

@rdgordon-index
Copy link
Contributor

I'm curious about more insight into the rationale for the 1- or 3-year "long-term" lock-out / cool-down intervals... and the math that goes into what the minimum privacy-safe interval would need to be.

Follow-up question -- have we considered that 1/1000 to be defined per runAdAuction -- so that we can debug everything happening for a single on-device auction ?

@michaelkleber
Copy link
Collaborator

The numbers are admittedly somewhat arbitrary! But sure, here is my thinking:

  • The 3-year global lock-out is the key user-facing story. When a person using Chrome wants to know "Has any information about me been used for debugging?", the answer is either "No!" or is "Yes, your browser sent one debugging report to one ad tech, it happened 8 months ago, and it will not send another one for the next 2+ years".
  • The 1-year cool-down is the key abuse prevention parameter. The 1/1000 down-sampling is really "1/1000 times that you get to ask for a report", and if you get to ask for a report every second / minute / hour, then 1/1000 means getting a report from the same user every 15 minutes / every day / every 2 months. One year of cool-down means that ad techs are well served by being thoughtful about when they should ask for a debugging report.
  • The combination of all the parameters leads to the ballparks of "thousands of reports a day", and also "at least 1/4 of people are not in the global lock-out state" even under pessimistic assumptions. That seems sufficient for a use case like "For debugging purposes, I need an example of the inputs to my function which is causing an error for 1% of people".
  • The idea that 1/1000 is per forDebuggingOnly, not per runAdAuction, is important to how much is leaked in the 1/1000 event that we do send a report. As designed, the worst possible thing that a report could do is reveal the browser's first-party cookie on two different sites — and with the 3-year cool-down, there's no way to bootstrap that into a graph of a person's identity across lots of sites. If we allowed logging of a full runAdAuction worth of data, then even a single logging event would give away a many-site identity join... which is the key outcome Privacy Sandbox aims to prevent.

Sorry that I don't have a closed-form formula for the reports-per-day figure. I had one back when there was only one kind of cool-down, but once a second cool-down rate came along, simulation seemed like the only viable way.

@rdgordon-index
Copy link
Contributor

"Has any information about me been used for debugging?"

I suppose that's the most important question -- given the great lengths to which PS goes to ensure anonymity, there seems to be some wiggle room in this endpoints which could, in principle, allow some non-"me" specific information to be used for debugging that wouldn't be about the user. For example -- am I scoring k-anon bids that way that I'm expecting as a seller?

and with the 3-year cool-down, there's no way to bootstrap that into a graph of a person's identity across lots of sites
Presumably that would be true even at 1 year, or 6 months -- hence my question about trying to better understand the limits of these windows. Clearly, reports every second/minute are needlessly aggressive, and merely a consequence of not needing to be "thoughtful" per se.

@michaelkleber
Copy link
Collaborator

I completely agree that the browser can be more relaxed about information when it is either information from a single site or information shared across many users. But bidding functions necessarily have information from two sites (the IG Join site and the publisher site hosting the auction), with no k-anonymity constraint on either of them; and scoring in a whole auction implicitly involves information from many sites (the IG Join sites of every IG that bids). I don't see any way that the browser can possibly be more relaxed about that sort of many-site user-specific sort of information.

@ardianp-google
Copy link
Contributor

In order to assess the sampling and other parameters, it will be useful if the API provides three bits that tells whether the report is sampled, whether the device is in cooldown, and whether the device is in lockdown period respectively, before rolling out this sampling mechanism. These can be reported via URL params appended to the reporting URL string: droppedDueToSampling, droppedDueToCooldown, droppedDueToLockdown, as shown here:

&droppedDueToSampling=true&droppedDueToCooldown=false&droppedDueToLockdown=false

We’re aware that it is possible for each adtech company to implement all the logic to simulate sampling/cooldown/lockout themselves while the 3P cookie is still available. However, it will be an additional work with some inaccuracy (as 3P cookie doesn’t map to device perfectly).

@rdgordon-index
Copy link
Contributor

Which is to say: if you accidentally push into production a bug that asks everyone in the world to send you a debug report, you would regain your ability to do selective debugging on 90% of browsers after two weeks, instead of after one year.

Just to clarify -- this cooldown is per ad tech (i.e. tied to origin somehow)? I'm asking because when there's also a mention of a "global" one:

A debug report gets sent 1/1000 of the time, and that browser enters global lock-out for 3 years
The 3-year global lock-out is the key user-facing story. When a person using Chrome wants to know "Has any information about me been used for debugging?", the answer is either "No!" or is "Yes, your browser sent one debugging report to one ad tech, it happened 8 months ago, and it will not send another one for the next 2+ years".

And I want to make sure I understand the distinction, and implications thereof.

@michaelkleber
Copy link
Collaborator

@ardianp-google: Good point, we should make it easy for consumers of the reports to understand what impact downsampling will have. I doubt we can offer all three bits, but I think the one bit from option 2 above gets a lot of the benefit.

@rdgordon-index: Yes, the 999/1000 cooldown is per ad tech, while the 1/1000 lockout happens only after sending a report, and is global across all ad techs. The way to think about the global nature is "Once a browser sends a single report, it will wait years before sending another one."

@rdgordon-index
Copy link
Contributor

and is global across all ad techs

Doesn't this provide another 'key abuse' mechanism, where ad techs can inadvertently affect each others' debug calls?

@michaelkleber
Copy link
Collaborator

Doesn't this provide another 'key abuse' mechanism, where ad techs can inadvertently affect each others' debug calls?

There is a risk, but remember than if another ad tech calls the API for everyone in the world, they have no impact on your debugging call on 99.9% of browsers.

It's true that if another ad tech keeps calling the API over and over, then some fraction of the population ends up locked out in the steady state, and if lots of other ad techs do this, then the fraction of the population you have available for reporting goes down.

I've put together a little Google Sheets calculator that uses the parameters I suggested above to approximate what happens in a few scenarios. (Thank you to @alexmturner for pointing out the 4x4 matrix whose principal eigenvector makes this run.)

https://docs.google.com/spreadsheets/d/1q-uBH7F_NAEWjqcGSChXj6TFbsQ4WK-p83RJTZrty9s/edit#gid=0

For example, with the above cooldown parameters and even with 25 ad techs calling the API as often as possible, 35.9% of browsers could end up in the lockout state — so you would still get reports from the other 2/3 of the population.

@jonasz
Copy link
Contributor Author

jonasz commented Dec 18, 2023

I was wondering, aside from the discussion about the target shape of the API - can we assume that in mode b (which starts on Jan 4th) forDebuggingOnly availability will not be limited / throttled in any way?

@michaelkleber
Copy link
Collaborator

The downsampling idea for forDebuggingOnly has not been implemented yet, and the debugging API will remain available in Mode B. Indeed, for people only testing in Mode B, this might be an essential part of their ability to debug their initial use of the API.

@fhoering
Copy link
Contributor

The proposal in its current state cannot support our needs.

First, we need info from won displays in order to compare online data with reported data e.g for the modeling signals field.
With the current proposal, if we have 1% of errors and 1% win rate, we would have 0.5 error reports for won displays out of the 5000 estimated reports per day .
In order to debug such error, we would need 1 000 error reports per day (1 every 1.5 min) which means 100 000 debug reports for won displays per day.

Second, we would need the same number of reports (100,000) for losses to ensure there is no error leading to systematic loss. This means the sampling should apply independently on wins and losses.

We are also a bit worried about the bias introduced by the cooldown and lockout periods which means only new Chrome browsers will send debug reports. Potentially automated bots will generate more reports than real Chrome users.

With the following parameters and using above excel file:

  • 100 ad techs
  • 1% sampling ratio
  • 90 days lockout
  • 30 days cooldown

We would get 100 000 events per day for wins and for losses.

Please note, that in parallel, we made the complementary proposal #871 for offline debugging needs.

@michaelkleber
Copy link
Collaborator

Hello Fabian, Happy new year, sorry for the delay in responding.

Certainly this proposed debugging API will not serve all needs, and if your goal is "to compare online data with reported data e.g for the modeling signals field" to find cases of buggy behavior, then I think the laboratory simulation approach discussion in #871 is quite valuable.

Second, we would need the same number of reports (100,000) for losses to ensure there is no error leading to systematic loss. This means the sampling should apply independently on wins and losses.

I think this different treatment of wins and losses would already be in your power. The two functions forDebuggingOnly.reportAdAuctionWin() and forDebuggingOnly.reportAdAuctionLoss() will let you condition your request to send a report on whether you win or lose the auction. This means you could decide to call reportAdAuctionWin() every time you bid, and call reportAdAuctionLoss() only 1% of the time that you bid, or whatever numbers got you the right distribution of reports.

  • 100 ad techs
  • 1% sampling ratio
  • 90 days lockout
  • 30 days cooldown

I don't think these numbers are realistic. First, the value "100 ad techs" in the spreadsheet is not meant to be the total number of ad techs, it is meant to be the number of ad techs that are calling the reporting APIs constantly, and so are always in the cooldown-or-lockout period. This is a worst-case scenario, meant to illustrate that you would still be able to get a reasonable number of reports even if many ad techs were conspiring to run a denial-of-service attach to prevent all reporting. I think it is much more likely that ad techs would be selective in exactly the way you want to be: call the API only on a small fraction of "normal" traffic, and call it at a higher rate when something "interesting" happens. This would put many fewer people into lockout, and everyone doing this would get many more "interesting" reports than the spreadsheet's lower bound.

A noteworthy part of my 14d-1yr-3yr parameters is that ad techs who did decide to call the API every time would mostly hurt themselves, because they mostly would end up in the cool-down period. Your changes have a big effect: they mean that an ad tech who calls the API all the time would hurt other ad techs a lot more, and hurt themselves a lot less. That means much less incentive for people to be thoughtful about how they use the API.

I also don't feel that your parameters have a particularly good privacy story. They would lead to each browser sending a debugging report roughly every 3 months. That means that if the ad tech ecosystem decided to use this as a tracking mechanism, they could join up every person's behavior across 5 sites per year. With my proposed parameters, a browser only sends a report around once every 8 years — so in a year, around 85% of people would send no report at all, and the other 15% could at worst be linked up across only two sites (and those people would surely send no reports at all for three years thereafter).

@hAckdamDys
Copy link

Hey @michaelkleber can you help me understand what this means a bit better? I asked around and don't think we actually have clarity here yet, at least not the kind we can make an implementation choice, even for short term adoption purposes, with.

The removal of 3PC has already started and has a planned ramp up date starting sometime in Q3 of 2024, so "as part of the removal of 3PC" could/should be interpreted as having already happened, but it seems like this is meaning to say that the forDebuggingOnly is still usable 100% of the time for some further period?

I'd ask that we detail this broken down something like the following:

Let's call "Unsampled/Unconstrained Availability of forDebuggingOnly" reporting the state where it can be called and will work immediately in any auction w/o limit or lockouts, and "Sampled Availability..." the state we'll get to eventually with lockouts and whatnot.

Current cohorts

Mode B Treatment 1.* Labels

For the set of Chrome browsers currently with unpartitioned 3PC access disabled AND sandbox APIs available:

1. Is forDebuggingOnly reporting still available in "Unsampled/Unconstrained..." on 100% of auctions for this cohort? (sounds like yes)

2. What is the "no earlier than" date for the "Sampled Availability with Lockouts..." of forDebuggingOnly reporting for this group?

3. Whatever the answer to (2) is, can it be stated publicly on the feature-availability page?

Everything Else

For All \ aboveCohort, same questions.

Next Ramp Up Round, Whenever That Is

Currently planned for Q3 2024, but let's just say on date X when more browsers move into the "yes PS APIs but no unpartitioned 3PC access" group. So, similar questions as above:

1. Will those browsers have "Unsampled/Unconstrained Availability of forDebuggingOnly" when they move?

2. Assuming yes, what will the no earlier date be?

3. And then, please to state in the feature status doc.

I can understand why we'd want forDebuggingOnly not to have an official support date, but a) it seems like we're now giving one to some deprecated*URN functions b) publicly stating the implementation priorities are forcing this would be reasonable and c) I have at least one choice to be made based on the robustness of this timeline, and I suspect I'm not the only one.

Hello @ajvelasquezgoog , do you know the answer to this?

@qingxinwu
Copy link
Collaborator

qingxinwu commented Apr 29, 2024

Feature rolling out status update:
Running sampling algorithm and adding a flag for forDebuggingOnly reports has been rolled out to 100% stable now.

It runs the down sampling algorithm on forDebuggingOnly reports, updates the
lockout and cooldowns in database, and adds forDebuggingOnlyInCooldownOrLockout signal to generateBid() and scoreAd()'s browserSignals. Note that it does not enable filtering forDebuggingOnly reports based on sampling result (which will be rolled out separately in the future), so all forDebuggingOnly reports are still sent after sampling.

Explainer: https://github.com/WICG/turtledove/blob/main/FLEDGE.md#712-downsampling
Spec: https://wicg.github.io/turtledove/#downsampling-header

@dmdabbs
Copy link
Contributor

dmdabbs commented May 2, 2024

Thank you @qingxinwu.

Note that it does not enable filtering forDebuggingOnly reports based on sampling result (which will be rolled out separately in the future), so all forDebuggingOnly reports are still sent after sampling.

I see the forDebuggingOnlyInCooldownOrLockout and win callbacks but not loss callbacks. Verifying that the loss reports are NOT being suppressed.

@naveenram00
Copy link

Hi, I’m working on the post-3PCD debug reporting framework for the Protected Audience API (FLEDGE), and I had a question about how the lockout/cooldown downsampling is implemented. Currently, if we have debug reporting on and 10 interest groups running in an auction, each of them will register a debug report to send on a winning/losing ping after the auction is complete. With the new downsampling, only one of these interest groups would be able to send a report before the browser is either put in cooldown or lockout, right? In the new downsampling API, when is sampling calculated, at the time that a report is registered using forDebuggingOnly.reportAdAuctionWin/Loss() (during the auction), or at the time a report is actually sent (after the auction)? Will the interest group that gets a report just be the first one to register a report with forDebuggingOnly? Or will they all go through, and only subsequent calls to the API be blocked?

@naveenram00
Copy link

I also have a question about the rollout of the new downsampling API. Right now almost all of our browsers are sending some kind of debug report. I know you have already made the forDebuggingOnlyInCooldownOrLockout boolean available to help with the transition. If right now browsers are being put into non-functioning cooldown/lockout states based on the sampling strategy, will these states persist with the launch? Or will they get reset?

Currently I would imagine a large amount of browsers are accumulating in the lockout and long cooldown states, as we are calling ForDebuggingOnly on almost every browser. By the time 3PCD happens, it would be unfortunate to have a large portion of browsers locked out because of the previous debug reporting strategy.

@qingxinwu
Copy link
Collaborator

qingxinwu commented May 24, 2024

If right now browsers are being put into non-functioning cooldown/lockout states based on the sampling strategy, will these states persist with the launch? Or will they get reset?

The states collected now during this test phase will be reset at the time of enforcing downsampling, to avoid a large portion of browsers locked out by the time 3PCD happens.

@qingxinwu
Copy link
Collaborator

With the new downsampling, only one of these interest groups would be able to send a report before the browser is either put in cooldown or lockout, right?

At most one. Note that it's possible that none of the debug reports from this auction is picked, and a future auction may have one picked by the sampling algorithm. See the spec for more details.

In the new downsampling API, when is sampling calculated, at the time that a report is registered using forDebuggingOnly.reportAdAuctionWin/Loss() (during the auction), or at the time a report is actually sent (after the auction)?

The sampling is calculated after the auction completes, because we need to know the auction result (winner) to know which debug report (win/loss) to collect from buyer/seller scripts. If a buyer loses the auction and it only calls debug win API, then it won't be sampled (thus not locked or cooldown) because it has no debug report to send anyways.

Will the interest group that gets a report just be the first one to register a report with forDebuggingOnly? Or will they all go through, and only subsequent calls to the API be blocked?

Only at most one will go through, but not necessarily the first one. It is randomly decided about whether to send a report or not (before one has been picked to send and all future reports will be dropped after that). And again, it's possbile that none of them goes through due to the randomness.

@qingxinwu
Copy link
Collaborator

I see the forDebuggingOnlyInCooldownOrLockout and win callbacks but not loss callbacks. Verifying that the loss reports are NOT being suppressed.

Are you still not seeing loss callbacks? Loss reports are not expected to be suppressed.

@abrik0131
Copy link
Contributor

The spec states: "sampling rate is 1/1000, which means only sending reports 1/1000 times the forDebuggingOnly API is called." This seems to indicate that the sampling rate is per forDebuggingOnly call, which means it is per IG.

This comment states: "It is randomly decided about whether to send a report or not (before one has been picked to send and all future reports will be dropped after that)" This comment, however seems to indicate that the sampling rate is per auction.

Could you please clarify whether the sampling rate (and other related rates in the spec) are per forDebuggingOnly call (i.e. per IG) or per auction?

@qingxinwu
Copy link
Collaborator

This comment states: "It is randomly decided about whether to send a report or not (before one has been picked to send and all future reports will be dropped after that)" This comment, however seems to indicate that the sampling rate is per auction.

Sorry about the confusion it brought, but I don't fully understand what "per auction sampling rate" means here.

Could you please clarify whether the sampling rate (and other related rates in the spec) are per forDebuggingOnly call (i.e. per IG) or per auction?

Each fDO call has a 1/1000 (the current sampling rate) chance to be picked. After one report is picked, the browser will be locked out for a lockout period (currently set to 3 years). While the browser is locked out, all fDO calls (including the ones that have not been sampled yet from the current auction, and those from future auctions) from this browser will just be dropped and no sampling is needed. Those rates are applied per fDO call, as the spec indicated. Let me know if that's still not clear.

@ccharnay67
Copy link

Hello,

We (Criteo) are currently investigating using the forDebuggingOnlyInCooldownOrLockout flag, and would like to confirm one point.

The flag only shows if the report would have been dropped because there is a global lockout for the user or we are in cooldown for the user, right? It does not indicate if the report would have been dropped because of the 1/1000 sampling? However, I assume you do the 1/1000 sampling internally in Chrome to calculate the flag?

We observe that more than 0.1% of the callbacks we receive have the flag set to false, and we would have expected it to be less if the 1/1000 sampling for the particular callback was included in the flag. So, if we want to know how many callbacks we receive once the downsampling mechanism is enforce, we should consider we will get only 1/1000th of the callbacks with the flag set to false, is that correct?

@michaelkleber
Copy link
Collaborator

Yes, your understanding is correct: the 1/1000 sampling only happens if you actually call the API, and it is not reflected in the value of the flag. See #632 (comment) above for discussion of that design decision.

@BasileLeparmentier
Copy link

Dear all,

At Criteo we had a deeper look at the forDebuggingOnly availability and we have the following feedback for the current state of the proposal, and based on them we are proposing some changes.

We believe that the current state has the following limitations:

  • The numbers of requests we would receive is far below what would be needed to run an adtech service at scale. With around 4 000 requests per day, we wouldn't even have one request per day for 90% of our advertisers.
  • Not only that, but the requests we would receive are heavily skewed toward new browsers, making them not representative of production and further complicating any debugging capabilities.
  • I believe there is a misunderstanding in the 'debugging habit' of ad techs. We are not debugging once in a while. We have incidents every day and are debugging every day. A cooldown period for ad techs therefore doesn't make any sense in our view.

For theses reasons, we are proposing the following changes to the fordebuggingOnly specification:

  • Users, identified to their browsers, are put by the browser in one bucket out of 1000 which would mean a data leakage approximately once per device life. This seed to decide the buckets should be random to avoid any recency effect that would bias results. This seed is set up at the browser level and is the same for all ad tech participants.

  • We send all reports to all adtechs from all users who verify: hash(day, user) mod 1000 = 0 (which would mean that the above buckets is identified with a given day). When for a given browser this condition is verified, all ad-tech can ask for all their requests during a 24h period with the forDebugging only endpoint. This has the following advantages:

  1. Having consistent user level data over a small period of time is needed as the adtech ecosystem is one of feedbackloops and we need to be able to debug along different occurrence (requests) of this feedback-loop.

  2. Having consistent sampling for all adtechs has two main advantages:
    a. It allows to give industry actor enough debugging data (whose sampling rate depends on the numbers of bucket chosen) to be able to run their system
    b. It allows cross industry incident to be investigated (as both ad techs have the same sampling mechanism).
    c. It allows user to have their privacy infringed only very infrequently (once every 3 years for a 1 000 buckets, but it could be modulated etc…)
    d. Because users all share the same buckets, there is no collusion possible between ad tech to get data from different users at different time to be able to follow them at different time.

  3. As explained above, Ad tech have no 'cool-down' period. We are always debugging, and we need the data to be flowing to be able to investigate a very significant variety of issues, which makes this requirement unusable in practice

We would like to hear feedbacks on these proposed modifications.

@michaelkleber
Copy link
Collaborator

@BasileLeparmentier I appreciate your attempt to offer an alternative that "would mean a data leakage approximately once per device life." But if I understand your proposal correctly, the size of this once-every-three-years data leakage is vastly larger.

With heavily-downsampled forDebuggingOnly, the most that a malicious ad tech could possibly leak, if they get lucky, is the ability to recognize the same browser behavior on two different websites: the Interest Group Join site and the site where the auction is taking place. It seems to me that with your proposal, the malicious ad tech could leak all of your activity and identity across all sites that you had visited in the past 30 days (or longer if #855 happens) — and moreover that every malicious ad tech could learn that, since you've removed the lockout-based need to get lucky. That level of privacy risk does not seem viable.

While I understand that heavily-downsampled forDebuggingOnly cannot meet all of your visibility needs, remember that the Privacy Sandbox goal is to offer multiple tools, each with their own privacy protections, that let you get insight into different kinds of questions you might ask about what happens inside auctions. Combining fDO with Private Aggregation (central DP) and with Real Time Monitoring (local DP) should give a richer picture, and in particular should let you use the relatively small number of fDO reports at just the right time to do the debugging you need every day.

@BasileLeparmentier
Copy link

Dear Michael,

With our suggestions, there will indeed be some leakage (albeit at a low rate of once every three years) which means that this leakage wouldn't be economical even for malicious ad techs.

To reduce the risks, we could also add an IG sampling mechanism depending on the browsers (via a hash(browser_id, IG)), where only 10 consistent IGs are returned by the forDebuggingOnly API for a given day.
This would strongly reduce the privacy leakage you are afraid off as malicious ad techs wouldn't be able to drop an interest group for all websites an users visit with the aim of reconstructing an user browsing history using forDebuggingOnlyApi. This should block this attack, whilst keeping the API utility for debugging.

Overall,the point we are making is not that the current specification of this API cannot meet all our visibility need, it is that it meets none.

  • To illustrate this, when troubleshooting with Google's SSP it always start with 'please send us the request id' so that we can troubleshoot. By design this won't be possible in the current set up.
  • Even today with full granular debugging on both side, we still do not manage to understand why we are losing more than 50% of the bid requests between the initiate auctions and the reporting in the PAAPI worklet. This is despite the issue being open since end of February and many back and forth between Chrome, GAM and us. One thing blocking us is the lack of full debugging capabilities and sampling inside some steps of the in-browser work to troubleshoot.
  • It cannot be used to debug any issue with even the biggest partner as we do not get data for 90% of them.
  • We cannot investigate any issues which is caused by any feedbackloop interaction as by design we only get one request per IG.
  • We cannot use this API to understand the relative scale of any issues as it is very heavily skewed toward new browser who don't have any interest group or at most one.
  • Should we manage to find the root cause, we cannot check that mitigation work as after any use of this API we are getting locked out for a period.

Debugging usually requires to reproduce a bug to be able to spot and fix them. By definition, they really are very hard to spot and fix with aggregated noisy data, either central or local DP. Even if the root cause is usually shared, the produced data can be in very high dimension.

These aggregated data will point toward a direction, which is really useful, but with the amount of data you are proposing, the mitigation stage, where we actually fix the issue will be done completely blindly, which means that very often it will not be done.

We believe that debugging will be integral to the success of the PA API. It will ease the adoption of the PSB, today very hard as there is so many way to get a step wrong. It will also enable to secure performance on the long run and avoid 'death by a thousands cuts' which may jeopardize overall success of the PSB.

We need this debugging API to work for the PA API, so we would really appreciate any suggestion on how to improve the current specification of this API.

@michaelkleber
Copy link
Collaborator

I would like to understand how much of your debugging goal would be possible to achieve using any mechanism that retained the worst-case behavior "only join a user's behavior across two sites."

For example, you mentioned the difficulty of a DSP and SSP collaboratively debugging, because those two parties don't get fDO logging on the same request. What would you think of a modification to the fDO downsampling so that if a buyer IG gets to send a debugging report from inside generateBid(), then the component seller and top-level seller also get to send debugging info from inside scoreAd() for that same bid? Since the buyer and seller would be getting information coming from the same pair of sites, this keeps the only-two-sites protection intact.

Your point about observing a deployed fix is very interesting, because it seems like the goal there could be to observe the same circumstances again later — that is, the same IG bidding on the same publisher site. That seems like it could be viable because a follow-up debug report in this case would still only give information about the same two sites, though I'll need to think more about how it could be achieved technically.

@rdgordon-index
Copy link
Contributor

What would you think of a modification to the fDO downsampling so that if a buyer IG gets to send a debugging report from inside generateBid(), then the component seller and top-level seller also get to send debugging info from inside scoreAd() for that same bid?

This is somewhat similar a a previous callout -- #632 (comment) -- about all auction participants being able to 'see' the same auction and/or bid.

@michaelkleber
Copy link
Collaborator

This is somewhat similar a a previous callout -- #632 (comment) -- about all auction participants being able to 'see' the same auction and/or bid.

Yes indeed, but restricting this to just a single buyer/seller/bid has far better privacy properties than a full auction with data from all the sites the user has visited in the past 30 days.

@BasileLeparmentier
Copy link

Dear Michael,

Thanks a lot for your answer. I understand your ask but I am unsure of the reason why there is a difference in nature between two sites data and slightly more being available.
There is definitely a difference in degree, but as it can happen also with a click and all the industry is trying to do it at scale via PII identifiers, it is interesting to focus on the degree of the harm and to assess if the likelihood of the attack and generated harm is greater than the need for this API.

If we add the 10 IGs limitation, we remove the risk of the full browsing history being leaked, and the sampling rate with this makes this approach uneconomical to be used for fingerprinting purposes which I believe make the trade off acceptable.
To answer your question on 'mechanism that retained the worst-case behavior "only join a user's behavior across two sites."'. I believe it will not work for the following use cases:

  • Issues that are linked to interest groups and that can only be spotted by subsequent calls. As the impact on one website will likely impact the bid on the next website on the same users (either via a bug, via the display counter mechanism etc..). These are very hard to spot without consistent sampling per userXIG. It may look unlikely but we have had significant bugs impacting user data by repeated interactions.

  • Issues that are linked to the full auction.
    -- To go back to the significant share of bid requests missing that we still have not managed to troubleshoot, we understood something was happening at auction level because the full auction was missing between the Key Value call and the In browser worklet. With data about only one interest group, we wouldn't have any way to tell if the issue was at the full auction level or at the IG level making it even harder to troubleshoot.
    -- With regards to your example of sharing with the seller, what if the error is caused by another IG, or by too many interest group being evaluated for the same request? In this case, being able to share doesn't allow for debugging.
    With negative interest group tagging, if there is a failure and it kills the full auction, then same issue, with only one IG it is impossible to spot.

  • The last point is on the sampling rate. With one request per 1 000 (the actual sampling we are proposing) we are already flirting with the limit of what we can actually work with. Having consistent data per users allowed to get significant requests scale whilst actually limiting the number of users whose data is leaked, reducing the actual risk of malicious usage of the API (as data only leaks for a very small portion of users, making malicious usage un-economical). To get the number we need with the current specification, I guess we would need a lock out period significantly smaller than the three years we are proposing (likely in days).

Overall I want to stress that the interconnection of the online advertising make all possible bug happen, even the most surprising ones. Being able to debug is therefore really a need for the PA API to have any chance to be deployed at scale and over the long run.

@rahulkooverjee-google
Copy link
Contributor

Without getting into the specific challenges around privacy (which admittedly are tricky), I want to share that GAM also considers debugging a critical use case for us and our partners. We've definitely run into challenges when working with DSPs who buy through us, so the more debugging tools the better. To Basile's point - when something does go wrong, it's often challenging to pinpoint where in the E2E flow the problem is (e.g. is it the DSP or the SSP), so in particular having some way for different entities to coordinate their debugging seems particularly useful.

@naveenram00
Copy link

Hi, I'm working on how to balance the reporting budget between sellside/buyside and wanted to double check something. At the end of the auction, only one of seller or buyer reports registered with ForDebuggingOnly will be sent, right? There's no way a seller and buyer report will be packaged so that both can be sent? If a browser can only send one report (buyer or seller) at most per day, will there be some kind of selection to make sure a report from each seller/buyer is equally likely to get sent, or will one of the reports just be picked randomly regardless of what type they are?

@abrik0131
Copy link
Contributor

Could you please clarify whether cooldown is tied to origin or to eTLD+1 ?

@qingxinwu
Copy link
Collaborator

Could you please clarify whether cooldown is tied to origin or to eTLD+1 ?

It's tied to origin.

@ajvelasquez-privacy-sandbox
Copy link
Collaborator

Hi, I'm working on how to balance the reporting budget between sellside/buyside and wanted to double check something. At the end of the auction, only one of seller or buyer reports registered with ForDebuggingOnly will be sent, right? There's no way a seller and buyer report will be packaged so that both can be sent? If a browser can only send one report (buyer or seller) at most per day, will there be some kind of selection to make sure a report from each seller/buyer is equally likely to get sent, or will one of the reports just be picked randomly regardless of what type they are?

it is the latter, reports are randomly picked

@ajvelasquez-privacy-sandbox
Copy link
Collaborator

Hello, we wanted to give an update to the ecosystem in how we plan to prepare the forDebuggingOnly (fDO) API for an environment in which some users choose to allow 3PCs, and other users do not, following our July 2024 announcement on a new path for Privacy Sandbox. For traffic in which 3PCs are allowed there is no additional privacy risk from the browser sending unsampled fDO reports compared with 3PC-based debugging. This means that for users who have allowed 3PCs it is possible for the fDO API to remain unsampled and so provide additional precision without compromising privacy.

Therefore we propose the following changes using what is already published in our Protected Audience explainer in, the downsampling section as the starting baseline:

When the user chooses to allow 3PCs in the current impression site, we will not proceed with the downsampling algorithm as per our explainer, and also that the browser won’t change the state to either cooldown or lockout after either generateBid() or scoreAd() sends the fDO report.

We want to allow use of the fDO API only when 3PCs were allowed both at the time the user joined the Interest Group (IG) and at the time that the IG participated in an auction. The simplest implementation we've come up with is for the browser to enter the fDO lockdown state when there is a cookie choice state change from not allowed to allowed, which lasts until all the IGs created prior to cookie choice state change have expired.

Additionally, we’ve heard requests from you to better handle the "bias" in fDO reporting because people who turned 3PC off recently (and hence are not in the cooldown or lockout state) are more likely to send fDO reports than people who turned 3PC off earlier. For this reason, we also propose that when the cookie choice state changes from allowed to not allowed, that the browser be placed on a lockout period that can randomly total between 1 and 90 days with equal probability.

We welcome the ecosystem comments on this proposal!

@rdgordon-index
Copy link
Contributor

When the user chooses to allow 3PCs in the current impression site

Can you elaborate on what this means?

The simplest implementation we've come up with is for the browser to enter the fDO lockdown state when there is a cookie choice state change from not allowed to allowed, which lasts until all the IGs created prior to cookie choice state change have expired.

Given #855 (comment), "have expired" can be up to 90 days, correct? Does this have anything to do with joining origin?

@dmdabbs
Copy link
Contributor

dmdabbs commented Nov 6, 2024

fDO is the only egress for critical reporting data today, so clear communication about intent to broadly implement this and presence of workable replacements is critical.

@rdgordon-index
Copy link
Contributor

rdgordon-index commented Nov 6, 2024

Thanks -- I missed this all-important line - https://github.com/WICG/turtledove/pull/1020/files#diff-d65ba9778fe3af46de3edfce2266b5b035192f8869280ec07179963b81f4e624R1232

Further to @dmdabbs' comment above, and as flagged to today's WICG call, there's an expectation that fDO isn't being downsampled -- so further clarity about timelines, rollout, and implications are very important to existing PAAPI integrations.

@BasileLeparmentier
Copy link

Dear Alejandro,
Thanks a lot for your message. I cannot stress enough how important proper debugging will be for the success of the privacy sandbox. Despite full visibility today, we keep finding issues and anti pattern that are hard to spot despite forDebuggingOnly availability, so we are quite scared of future restrictions.

We welcome the fact that forDebugging will not be sampled for uid-enabled traffic. This is really a very good news but will still cause many issues of coordination;

  • Market Players will not use the privacy sandbox for third party cookies, but will use open RTB as currently it produces much better performance.

  • We will therefore have to coordinate between buyer / sellers to reserve a small proportion of the traffic we intended to use in openRTB to go through privacy sandbox instead.

    • For this to be sustainable and efficient, and, more important, to be representative of the prod environment, we would need all market players to identify a subset of users onto which it is 'standard' to test privacy sandbox on.
    • It could be done via a flag randomly set per browser which tells us the user is in the 1/1000 for testing, with no impact other than market actor could use this for coordination.

This population will be extremely useful but will still not be ideal because of the following limitations:

  • The competition will be very different than under the true privacy sandbox population, meaning that issues such as browser overloaded by interest groups won't be spottable,
  • Investigation about higher than expected loss rate is going to be much harder because of the cookie competition, making investigation more cumbersome.
  • The likelihood that we will be able not to use the available cookies on all systems to perfectly mock the privacy sandbox is actually quite low given the complexity of our systems and so we might be observing a biased population.

Another solution to reduce bias would have a small cookie opt in population where the cookies are acutally deactivated and which would therefore avoid the pitfalls described above.

We still want to stress that this is a very welcome development.

For this proposal 'For this reason, we also propose that when the cookie choice state changes from allowed to not allowed, that the browser be placed on a lockout period that can randomly total between 1 and 90 days with equal probability.'
We believe that this is better, but will still lead to bias. Given the currently extremely low likelihood of not being in a black out period, we would still prefer a longer period (such as the average browser lifetime) to reduce this bias further.

Moreover, in general, for pure privacy sandbox users, the one we are actually interested in, we would still really rather to have a way to share data on at least a single IG across all auctions participants, and with a much higher sampling rate than what is proposed, which is way too low for actual usability as described in this answers: #632 (comment)

We thank you for this development!

@naveenram00
Copy link

Hi,

I had a question about the handling of 3PC and non-3PC traffic. Now that you no longer plan on downsampling the fDO reporting coming from the 3PC-consented users, we can proceed with our old high-volume debug reporting strategies on that slice. However, to get a useful signal from the non-3PC traffic, we will still need to implement some different strategy where we call the fDO API less often or in specific cases so as not to waste the limited reporting "budget" for the downsampled slice. Since sampled vs unsampled is decided on more than just cookie presence in the current call, it will be difficult for us to determine which group a browser is in on our own.

Would it be possible for you to provide us with another bit, forDebuggingOnlySampled, which indicates whether a given browser is in the unsampled 3PC pool or the sampled non-3PC pool at the time of the auction? If we misidentify even a small pool of browsers as unsampled when they are actually sampled, it would lead to a mismanagement of our browser pool. For example, if our strategy is to report every auction for unsampled traffic and to report only certain cases for sampled traffic, we could lock out all of our browsers quickly if we start trying to report every auction for some of the sampled browsers.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests