Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ad Topic Hints #26

Open
benjaminsavage opened this issue Jun 21, 2021 · 15 comments
Open

Ad Topic Hints #26

benjaminsavage opened this issue Jun 21, 2021 · 15 comments

Comments

@benjaminsavage
Copy link

benjaminsavage commented Jun 21, 2021

What if we flipped digital advertising around?

Today, when you visit a website, each ad network roughly follows three steps:

  1. Figure out who you are,
  2. …then look up a profile of behavioral data about you. Stuff like what other websites you’ve visited.
  3. …then, based on this, try to infer what ad topics you might be interested in seeing.

What if we flipped the script entirely, to make the web more private, but also to put people in control:

  1. Visit website
  2. Multiple ad networks all ask your web browser: “What topics of ads should I show to this person for this website visit? Please select a topic they’ll find interesting / relevant.”

This not only skips over the resolution of the user-identity step (which is poised to break in light of browser tracking prevention efforts), it also means the ad networks no longer need to keep a profile of behavioral data about individuals.

But perhaps most interesting of all, it moves that decision of “what ad topics would you be interested in seeing” into a place where people can exert control over the process.

Through a combination of sensible automatic defaults, with the opportunity for people to manually override the system (if they so desire) perhaps we can have both relevant ads about interesting topics, and also preserve user privacy and autonomy.

Addressing the risk of fingerprinting

People have multiple interests, and these interests change over time. What’s more, people don’t necessarily know what they like. An important function of ads is to help people discover new products and services that they’d love, but didn’t know about before.

As such, the “Ad Topic Hints” returned by the browser should change constantly. Some topics of interest may show up more frequently than others, and the user might express the desire to see other topics less. And finally, there ought to be some randomness thrown in - to mix things up and explore topics they haven’t seen before.

This is great news from a privacy perspective, because it means these “Ad Topic Hints” couldn’t be used as some kind of tracking vector, or fingerprinting surface. If the “Ad Topic Hints” returned by the browser include a lot of random variation and change over time, not only across sites, but even across multiple sessions on the same site, we should be able to ensure they can’t be used for fingerprinting. This is one of the major points of criticism about FLoC that this “Ad Topics Hints” proposal seeks to address.

Addressing the risk of sensitive information disclosure

These ad interests aren’t communicating data about what websites a person has visited, their attributes or characteristics. FLoC indirectly does this (to some extent), and this is another piece of criticism this proposal seeks to address. Since we’ve flipped the script, this proposed API would instead be sending out information about characteristics of ads, not people.

But perhaps more importantly, this API would, by design, provide the user with the ability to inspect (and if they so desire, override) the set of “Ad Topic Hints” their browser is telling sites to show to them. Any inferences being made about what ad topics their browser thinks they may find interesting would be clearly displayed. Rather than have the browser vendor determine what is “sensitive” or not, if the person felt that a given “Ad Topic” revealed something they didn’t want revealed, they could ask their browser to stop requesting ads of that topic.

Ad topics as vectors of numbers

Rather than describe an “Ad Topic” with a categorical label, we propose using a neural network to convert ads into embedding vectors (introductory explanation here if you're not familiar with the concept). This has numerous benefits. It’s precise, can be universally applied without the need for human annotation, smoothly captures concepts that don’t have simple names, works across all languages, and allows us to compute the numerical “similarity” of any two ads.

Imagine an open-source ML system into which you feed an ad. It analyses the image / video as well as text, and emits a list of 64 numbers. Like this:

1.56, -3.82, 3.91, -2.27, -7.16, …, 1.81

Anyone can run this code on any ad to instantly get the list of numbers that are the “embedding” for that ad. We can design a system which can deal with all kinds of inputs, so that it works for image ads, video ads, text ads, anything.

This ML system would be designed so that ads about similar topics result in nearby points. In this way, we can easily compare any two ads to see how “similar” they are. We just compute the cosine of the angle between these two vectors. It’s as simple as just computing the dot-product of both embedding vectors and dividing by both magnitudes. It’s computationally fast and cheap.

Now that we have a simple, standard way to understand the “topic” of an ad, and a way to compare the similarity of two ads, let’s describe how it would be used.

The browser can select a vector that’s “similar” to other ads the person finds interesting / relevant. It can avoid selecting vectors similar to ads for which the person has expressed dislike. And every now and again, the browser should just pick a random area it hasn’t tried before - to explore, and learn if the person is interested in that topic or not.

Sensible defaults

Most people will not want to take the time to interact with an interface that asks them about their ad interests. That’s fine, so long as we have a reasonable default for people who don’t bother to configure this themselves.

The browser can collect information about ads the person is shown across the web, ads they click on, and ad conversion events.

Based on this information, the browser can infer what ad topics the person seems to engage with (and what they do not).

Autonomy through centralized transparency and control

Unlike much behavioural advertising today, where inferences derived from behavioural data are often invisible and unknowable - the browser can make all of this available to the user. It can show them not only the inferred interests it has generated, but also the raw data used to generate that prediction.

This leads to the second big difference with most forms of behavioural advertising. The user may choose to modify or override these inferred interests.

The fact that these inferences are all centralised within the browser is what makes this a tractable user experience. It’s not realistic for people to identify all the ad networks which may be making inferences about them based on behavioural data. It’s even less realistic to imagine that people will modify / override these inferences across all those networks. Centralisation gives the user a realistic form of control.

This should also address concerns about “autonomy”. When it’s possible to see all the data, and all the inferences, and to override / modify them in one place, we can say that this puts people in control over the ads they want to see and what information their browser transmits about those interests.

What’s more, the browser should allow people to configure how much “exploration” they’d like. Some people might desire more variety, while others might prefer their browser to stick to a narrower range of ad topics.

This proposal isn’t prescriptive about the exact algorithm the browser should use to select the ad interest vector to be sent to a given page, as this should be a great opportunity for browser vendors to compete with one another, in terms of ease of use and relevance of ads, as well as ease of user understanding and control.

Ideas about ways to incorporate user-stated/controlled interests

Several important proposals about ads and privacy involve labeling ads in a way that the browser can understand. While these proposals are primarily about attribution / measurement use-cases, we could utilize this here as well.

Once a browser understands what pieces of content are ads, it could potentially introduce a universal control that allows people to tell the browser how they feel about the “Ad Topic” of that ad. Perhaps a “right click” or a long-press on mobile could reveal options like “More ads of this topic” or “Fewer ads of this topic”.

Another idea would be for the browser to have a special UI somewhere with an infinite feed of ads. These could either be a hard-coded list, or could be fetched through ad requests to networks that wanted to participate in such a UI. People could go through this “ad feed” selecting “More ads of this topic” or “Fewer ads of this topic” on each. This would help the browser quickly understand more about what this person did / didn’t want to see ads about.

There are no doubt many other ideas out there which merit experimentation. This is just the beginning of this conversation.

Concern about centralized browser control

But there are also downsides to this level of centralization within the browser. Browser vendors who operate “Search Ads” that rely on first-party data would be able to personalize ads with or without this “Ad Topic Hints” API. They wouldn’t have much incentive to make this system work particularly well (from the perspective of ad monetization). As such, they might under-invest in this “Ad Topic Hints” API.

How can we stimulate more competition in this space? One possible approach would be to make this API “pluggable”. Such browser plugins would need to be reviewed / vetted to ensure user privacy and stop abuse. Plugins would have access to the ad-interaction data described in the “sensible defaults” section as well as user feedback on ads, and could design their own user-interfaces as well as algorithms to generate the “Ad Topic Hints” returned.

Making “Ad Topic Hints” pluggable is just one idea. There may be even better solutions available.

Understanding Ad Topic Hints

Advertisers will naturally want to develop some understanding of these “Ad Topic Hints” and map them to concepts they already understand, like the IAB taxonomy of ad topics.

The easiest way to understand these “Ad Topic Hints” would be to take a sample of ads that represent all the various categories in the IAB taxonomy of ad topics, and run them through the ML system. Ideally one would produce mappings for multiple examples of each category.

Then, for any “Ad Topic Hint” vector, one could compare it to these reference points. A simple approach would be to just consider the topic of the ad with the “closest” vector. A more sophisticated approach might consider the actual “distance”. If the closest reference point is sufficiently far away, this may be an unlabelled part of the ad topic spectrum. We may discover that additional categories need to be added to existing taxonomy systems.

To help illustrate this mapping process, imagine these embedding vectors were just two dimensional. By coloring the space which is closest to a given reference point all the same color you’d wind up with a Voroni Diagram like this:

Image of a Voronoi diagram from Wikipedia

Imagine that each of those black dots represents a “reference ad” deemed to belong to a particular “Ad Topic” in the IAB’s taxonomy. Any “Ad Topic” vector would fall into one of these colored regions. A simple approach would be to deem that topic the same as the reference point within that region.

@benjaminsavage benjaminsavage added the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Jun 21, 2021
@dialtone
Copy link

Have been spending some time on this topic lately and reached a similar condition. We put together our early thoughts in this new bird spec: https://github.com/AdRoll/privacy/blob/main/PAURAQUE.md

@asoltani
Copy link

asoltani commented Jun 24, 2021

Also some historical context for a company/browser extension called 'Bynamite' that attempted something similar:

The extension would

  1. infer user interests (based on browsing activity)
  2. expose those inferences to the user and permit them to edit
  3. then inject only the 'allowed interests' into ad calls (without the ad networks needing to make any changes)
  4. block ad calls outside of those interests...

There are some images here: https://bynamite.en.softonic.com/mac
image
image

Last I checked, one of the authors went to work for Facebook and the other for Google I believe. https://www.nytimes.com/2010/07/18/business/18unboxed.html (1/2 way down the article)

Last week, Bynamite introduced an early, or beta, version of its software, a downloadable plug-in for browsers. That software and its Web service monitor what ad networks and e-commerce sites collect and assume to know about a user. A user’s interests are then assembled on a Web page, grouped by categories like “news and current events,” “general health,” “travel,” “technology” and “shopping.” The categories are weighted by how often you visit different categories of sites or make purchases at some online merchants.

The information tracked by Bynamite is steadily updated, and, at least for me last week, a small pop-up alert at the bottom of my computer screen appeared every day, informing me of new information about me from ad networks. Mr. Yoon calls the product’s early version mainly a “mirror,” showing users how the commercial Internet sees them.

Users can change that mirror to represent their interests more accurately. For example, I don’t own a car, but my “automotive” folder soon had several entries, saying I was interested in Mercedes-Benz and other brands, presumably because middle-age men who visit the Web sites I do are typically attractive targets for car ads. I deleted the auto interests, suggesting to advertisers that I’m not necessarily a good prospect. Still, I saw a few car ads on sites I later visited.

@TanviHacks TanviHacks removed the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Jun 24, 2021
@npdoty
Copy link

npdoty commented Jun 28, 2021

The AdNostic paper (also from 2010) lays out different architectures which would reveal different amounts of information in order to provide behaviorally-targeted advertising. Proposals like this one would fall in the "Reveal the behavioral profile" category, and the authors speculated that a system could include ad interests as HTTP request headers. That's less revealing than disclosing the clickstream/browsing history (which many would consider the status quo), but more revealing than client-operated auctions.

That's clearly not the only dimension of privacy that we should consider for these proposals: there's how disclosure of interests can be used for identification/fingerprinting; transparency and control for the end user; etc. Browser-provided UI to manage interests has the potential for more meaningful transparency and direct control. Rotating and fuzzing interests in order to limit fingerprinting is worth considering, although how effective that would be would require deeper analysis.

Paper reference for those who haven't already seen it: Toubiana, Vincent, Arvind Narayanan, Dan Boneh, Helen Nissenbaum, and Solon Barocas. “Adnostic: Privacy Preserving Targeted Advertising.” In Proceedings Network and Distributed System Symposium, 2010. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2567076.

@benjaminsavage
Copy link
Author

Recap of feedback from June 24 Privacy CG

Thank you everyone for all of the valuable feedback, and the great discussion. Overall, the proposal was met with positivity. Many expressed a desire to put people more in control of the type of ads they see, and an interest in exploring user-stated / controlled interests. You can see all the feedback in the minutes: (link).

In this comment I’d like to respond specifically to the feedback from Johann Hofmann and John Wilander. I am proposing a significant update to the proposal based on their comments.

Concern about default state

Johann expressed a concern about the proposal as currently written. When people take no action to configure their preferences, if automatic inferences are returned based on behavioural data, there is a risk of sensitive information disclosure.

I think this is a fair point. The alternative is for the user agent to just return a fully random “Ad Topic Hint” when no explicit preferences have been configured.

First let me say, I think it best to incorporate this feedback from Johann. As such, I’m amending this proposal to take the alternative route. So; when no ad preferences are configured, let’s assume the user agent returns fully random “Ad Topic Hint” vectors.

I previously suggested behaviorally derived default behaviour in the original proposal because I’m concerned that if this API is returning random vectors for 95% of people, it will not be useful for the purpose of ad selection. As such, ad-tech vendors would just ignore this API, leading to low adoption and user frustration when their stated preferences then appear to have no control over the ads they see.

However, we can try to devise alternative solutions to address this risk. Here are two ideas:

  1. The first idea is for browsers which adopt this API to automatically add some clear, obvious UX which allows people to provide both positive and negative feedback on the ads they see. If there is a clear and simple UX, automatically shown on all ads rendered in the browser, over time more people will interact with it and we won’t be in a situation where 95% of people have no preferences.

  2. The other idea I had was to embrace data-portability. I recommend we develop a mechanism by which websites can request to append additional “Ad Preferences” to the browser.

    1. I imagine an API by which a website can tell the browser: “Here are two lists of ads; a list the user provided positive feedback for, and a list they provided negative feedback for.”
    2. This mechanism should require user consent, and should be facilitated through a dialogue like the “Storage Access API”, which ensures such data sharing is an intentional decision of the user. I’m imagining a prompt that tells them: “ORIGIN would like to add to your ad preferences. Do you approve?”. I imagine options like “Approve”, “Disapprove” and “Review this update”. That final option should just show the user the list of ads to be added, specifying which are “positive” and which are “negative” examples.
    3. People should have the option to accept this addition either in whole or in part.
    4. If they already have ad preferences stored in their browser, such an update should just append to the end of the list, not over-write anything.
    5. This way, websites which already have some knowledge of user ad preferences can help bootstrap the ecosystem.
    6. This also addresses my earlier concern about incentives. If browsers support such a data-portability API, others aside from the browser vendor can develop their own ad preference configuration screens. This way we don’t have to just rely on the browser vendors to make nice, usable user-tools.

Null-state behavior

John Wilander asked that it be impossible to distinguish between browsers where people had, and had not curated a set of ad interests.

I agree with this. The API should always return a vector, never null. If no better alternative is available, the API should just return a random vector.

This should make sure websites cannot block people from using a website until they enter preferences.

You should be able to lie

John stated that people should be able to lie about themselves, specifying interests that are NOT in fact their own interests.

I agree, and this is also aligned with my original intent in this proposal.

This API does not aim to capture any information about a person’s attributes, characteristics, or behaviour. The only feedback people would be providing would be “More like this” and “Less like this” clicks on ads. The internal storage would just be two lists, a list of “Ads they provided positive feedback on” and “Ads they provided negative feedback on”.

This is 100% within the user’s control. There is nothing to stop people from providing feedback which is NOT actually aligned with their own interests. If I dislike seeing ads for credit cards, I could still say “More like this” on every ad for a credit card I see in order to see what kind of credit card ads are out there on the internet.

On-device auction

John Wilander also asked if we could move to a fully on-device auction so that these ad topic hints never leave the device at all. I don’t think this is realistic. Ad networks are choosing between millions of ads. These cannot all be sent down to the device.

My opinion is that it is OK for “Ad Topic Hints” to be saved by websites and associated with PII, so long as we can reach the design goal that these do not enable cross site tracking. This will require thoughtful design, and is critical to the proposal.

First of all, there is so much random noise inherent in the API that they do not even reveal the specific ads the person provided positive / negative feedback upon.

Secondly, to the extent that these stated preferences are being shared, this is OK (in my opinion) because expectations should be clearly set with people that their preferences will be shared with websites for the purpose of selecting ads more to their liking. While the API is designed to share these only in approximate form to prevent fingerprinting attacks, the concept that the data is shared is aligned with user expectations.

Thank you!

Once again, thank you everyone for all the valuable feedback. I really appreciate all of the positive shows of support and interest in this concept.

I’d like to discuss these changes to the proposal next Privacy CG to see if they address the last set of concerns, and to see what additional concerns people might raise.

@benjaminsavage benjaminsavage added agenda+ Request to add this issue to the agenda of our next telcon or F2F and removed agenda+ Request to add this issue to the agenda of our next telcon or F2F labels Jun 28, 2021
@johannhof
Copy link
Member

Thanks for capturing my concern Ben!

I agree that disallowing automatic inference by default is a good step, but as you said this drastically reduces the usefulness of the API, depending on how much the majority of end users actually choose to interact with the UI provided by browsers. Most "regular" folks I know aren't going to interact with nonessential UI unless they're being coerced through dark patterns. I'm genuinely curious whether there's a "sweet spot" for such a UI where we can ensure both informed user choice and massive usage.

Regarding your point 2), is there data/research that shows that a significant number of users are interested in carrying over personalized advertisement choices? How can we verify that the ad data is truly from the user and not built by the website to improve their ad campaigns?

Finally I think it would be helpful if the proposal laid out the noising/DP aspect of the API more clearly. Is noising achieved through adding random ads to the list or is it a side-effect of using embedding vectors somehow? Will the specification enforce a level of noising? As far as I can see the same incentives mentioned above also apply to noising, where a lower level of noise yields better results for advertisers.

@michaelsgorman
Copy link

Ben,
Thanks for sharing this concept with WICG. I have a follow up question.

@michaelsgorman
Copy link

michaelsgorman commented Jul 13, 2021

To what degree were you proposing that inferred or explicit user preferences should determine what advertising a user sees? Or, were you proposing that, potentially, a system of behavioral targeting that used embeddings and user input to incorporate user preference while limiting privacy risk could effectively compete against other methods of behavioral targeting, such as those relying on centralized records of user behavior, perhaps making the old methods unnecessary/obsolete?

Also, wanted to endorse your suggestion of enabling plug-ins, so competing alternative solutions could solve problems without needing to rely on browsers to build all of the features and anticipate every need.

Michael, ShareThis

@sandsmark
Copy link

I assume it is implicit that this processing (behavior data into a feature vector) will not be enabled without explicit consent from the subject? And how is consent to be provably (in court) acquired if several users share the same device?

@bokelley
Copy link

I'd like to suggest an extension to the proposal, building on John Wilander's idea of an on-device auction and having a feature vector on each ad.

Currently we are in a world of nested auctions (ad server, ad exchanges, SSPs, DSPs) which in concept return the ad that will pay the highest price to the publisher - in essence, a one-dimensional objective function that is something of a proxy to advertiser expected value, but seen through the lens of layers of ad tech and attribution games and general confusion and disinterest.

Consider a different objective function: "choose the ad that pays a high-enough price to the publisher and maximizes user engagement." User engagement might be defined as ("more like this" + clicks* + lingers - 10 x "fewer like this"). * I am not sure we want "clickier" ads, but I'm thinking that an annoying or inappropriate ad will get downvotes that will outweigh the clicks.

This enables a new model where the publisher returns the browser a list of say 100 ads. The browser takes this list of ads plus the ad topic hints data, runs a local ML model, and selects the ad that maximizes user engagement. If the model returns a low likelihood of engagement, the browser will not serve an ad at all, which prevents gaming and encourages publishers to send back a broad range of ads.

The browser is now able to infer the topics that the user is interested in from the features in the ads, and this data is kept locally. The user can edit and manage this information in the browser as per the original proposal, but we don't need to send it to the ad tech ecosystem (where I worry that it gets fingerprinted and persisted).

Here's where it gets kind of fun. The publisher now starts to build a model of what ads their users are actually choosing - effectively building an aggregated descriptor of their audience. When they want to get ads from ad networks, they can share this model (its coefficients or whatever) and use this model to select the ads to return to the browser.

I am sure I am missing some things here, but I think this is a way to effectively change the objective function for advertising on the internet. To quote Seth Godin in his blog post today (which inspired this idea), "It’s beyond dispute that industry is an efficient way to produce more. The question is: More of what?"

@benjaminsavage
Copy link
Author

In reply to @michaelsgorman

To what degree were you proposing that inferred or explicit user preferences should determine what advertising a user sees? Or, were you proposing that, potentially, a system of behavioral targeting that used embeddings and user input to incorporate user preference while limiting privacy risk could effectively compete against other methods of behavioral targeting, such as those relying on centralized records of user behavior, perhaps making the old methods unnecessary/obsolete?

I am just proposing a new web-API, that I suggest we add to browsers. There are a lot of factors which might have an impact on the extent to which it is used in practice. I think it's hard to predict. One major factor might be the existence of alternatives. As browsers like Safari and Firefox take actions to curtail tracking, there might not be a lot of alternatives left.

In reply to @sandsmark

It's hard to follow this proposal in the current form (threaded conversation on a single issue). Perhaps soon I will move this proposal into an actual folder within this repo. But in a follow up comment I noted that in response to @johannhof's expressed concerns about the use of behavioral data, I was amending this proposal to NOT use any behavioral data, including in the initial default stage.

In response to @bokelley

This is an interesting idea, but out of scope for this proposal. I'd like to keep this proposal very simple: just a signal provided from the browser that is 100% driven by explicit user choices about what type of ads they would like to see. As for what ad-tech vendors do with that signal, and what ML algorithms they run, and what they optimize for, that's out of scope for now.

In reponse to @johannhof

Most "regular" folks I know aren't going to interact with nonessential UI unless they're being coerced through dark patterns. I'm genuinely curious whether there's a "sweet spot" for such a UI where we can ensure both informed user choice and massive usage.

I too hope we can find this "sweet spot"! The good news is, people will not need to provide feedback on all ads, or even a majority of ads for this to be useful. It's early days and hard to say, but if, over time, people provide feedback on a total of ~20 ads I suspect that will provide a pretty useful signal. Do you think that, over time, "regular" people might click a "relevant topic" or "not relevant topic" button on ~20 ads? There would be diminishing returns with more and more feedback, still useful, just not as incrementally so as those first few pieces of feedback. Unless people's interests change and they'd like to start seeing ads for different topics, they should be able to stop provided feedback after a certain point.

Regarding your point 2), is there data/research that shows that a significant number of users are interested in carrying over personalized advertisement choices? How can we verify that the ad data is truly from the user and not built by the website to improve their ad campaigns?

Today, I don't think any website collects this type of data, so there isn't the possibility of carrying it over. If such an API were available though, websites could design new experiences to collect feedback on ads (relevant topic / irrelevant topic). To ensure that this is truly indicative of user choices, and not just made up by the website, I'm proposing an API that shows the person the ads and tells them:

"This website claims you told them this ad is a "relevant topic" for you. (Show picture of the ad inline) Is this correct?"

I'm hoping that by involving the user in the flow, showing them the exact ad in a visual form, and asking for them to positively affirm that it is a representation of their preferences that we can prevent folks from gaming this.

Finally I think it would be helpful if the proposal laid out the noising/DP aspect of the API more clearly. Is noising achieved through adding random ads to the list or is it a side-effect of using embedding vectors somehow? Will the specification enforce a level of noising? As far as I can see the same incentives mentioned above also apply to noising, where a lower level of noise yields better results for advertisers.

I wrote some code =). That seems like an easier way to explain my idea than with words =).

  • I recommend clicking the little menu button then selecting "View in fullscreen" to see the chart.
  • Click and drag to rotate.
  • I can't figure out how to keep it from stretching / skewing what should be a perfectly spherical shape! Super annoying... just pretend the axes aren't skewed and it's a nice sphere =).
  • In reality, we would probably be using a higher-dimensional embedding for ads (16? 32? 64?). It's hard to comprehend that, so I'm just using 3 dimensions to communicate the concept.
    https://jsfiddle.net/3rxo0ent/9/

This plot imagines that the person has provided positive feedback on 3 ads. Those 3 ads have the following embedding vectors:

  • [-9, 4, -2],
  • [6, -5, -2],
  • [0, 1, 2],

It also imagines that the person has provided negative feedback on 2 ads. Those 2 ads have the following embedding vectors:

  • [0, 1, 0],
  • [1, 0, 0]

It also imagines that we will try to select a random "Ad Topic Hint" 20% of the time. (The level of "exploration" built into the system)

This plot shows 500 "Ad Topic Hint" vectors generated from just these 5 pieces of feedback.

  • There is a base-level of fully randomly selected vectors.
  • There is a higher concentration of vectors pointing in similar directions to the ads upon which the person provided positive feedback
  • There is a lower concentration of vectors pointing in similar directions to the ads upon which the person provided negative feedback

That's all that really matters I think. The specific algorithm isn't as important. Here is the super simple algorithm I implemented:

function genAdTopicHint() {
  let x, reject;
  if (Math.random() < random_ad_probability) {
    do {
      x = genRandomPointOnNSphere(3);
      reject = shouldReject(x, ads_with_negative_feedback);
    } while(reject);
    return x;
  } else {
    do {
      let ad = getRandomAdWithPositiveFeedback();
      x = addLaplacianNoise(ad, 0.05);
      reject = shouldReject(x, ads_with_negative_feedback);
    } while(reject);
    return x;
  }
}

It's up to the browser to decide how often to give the website an updated "Ad Topic Hint" vector. Every page load? Every session? Perhaps time-limited? If you change the number in the for-loop to say 10, and keep refreshing the graph by saving, you'll see there is an awful lot of randomness. It would be very hard to uniquely identify someone from even 10 "Ad Topic Hints". My goal is for this signal to NOT be usable for fingerprinting. The hard-coded "0.05" factor I used as the "location parameter" in the Laplacian noise generation is just there for illustrative purposes. I imagine the selection of that specific factor would be up to the browser vendor.

The 20% random vector selection as well is just a hard-coded constant for the purposes of this illustration. The default would be up to the browser vendor, but I think this is a good one to let people customize should they want. People might want more or less randomness in their ad selection. Again, the browser vendor could select limits on this that they felt were appropriate.

@benjaminsavage benjaminsavage added the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Jul 19, 2021
@bedfordsean
Copy link

The 3D scatter framework in that JSFiddle behaves fairly inconsistently in different browsers, so uploading a couple of static screenshots to illustrate what it looks like as well.

image

image

@dialtone
Copy link

I feel like there could be a place where PAURAQUE and Ad Topic Hints merge some of their better ideas to create an
improved spec.
PAURAQUE provides for targeting attributes and interests without necessarily ever having this data go to a non trusted
server or simply just a reporting mechanism to allow for defining the right balance between targeting criteria and
budget.
Ad Topic Hints has a better mechanism to potentially represent such interests and generate the feedback loop that allows
to keep information up to date without generating the need for dark patterns.
There is probably a middle ground between the 2 where their current issues can be ameliorated.

@kdenhartog
Copy link

kdenhartog commented Jul 22, 2021

It also imagines that we will try to select a random "Ad Topic Hint" 20% of the time. (The level of "exploration" built into the system)

The 20% random vector selection as well is just a hard-coded constant for the purposes of this illustration. The default would be up to the browser vendor, but I think this is a good one to let people customize should they want. People might want more or less randomness in their ad selection. Again, the browser vendor could select limits on this that they felt were appropriate.

I'd think this being user configurable but also defaulted by the browser would create a useful tension. The reason I see this as useful is it allows the user to take a useful feature like discovery and pits it against another useful feature of privacy. Then since this is user configurable rather than always defaulted by the browser, it allows the user to make a determination on their own if they so choose to weight it in alignment with their own values. So for example, if they set their "Ad Topic Hint" to 100% it generates a random "Ad Topic Hint" all of the time, and if it's set to 0% they get perfectly targeted ads based only on their preferences.

The advantage here is that because this is user configured (but defaulted by browsers in a sane way - say 20% as you suggest) it also creates an interesting competition for adverstisers that's no longer zero-sum (support the feature or don't). Instead, advertisers have to compete on their ability to deliver a high enough value that the user opts to not want to keep their data private (although they still have the option to if they want to).

@AramZS
Copy link

AramZS commented Jul 22, 2021

I like this proposal and am looking forward to us digging into it! I think have a few things I'd like to note.

  1. Someone has already mentioned that there might be multiple elements that indicate engagement with an ad. I agree, I think adding viewability to the list is useful as well.

  2. I understand your inclination to push the logic for sensitive topics to the user, but then we are depending on the user to realize the implications of an ad as indicative of a potential sensitive topic being targeted as well as catching that ad in this overall process of mapping preferred ads to a user. I think that's a lot of work to put on a user on a very important and potentially harmful issue. I would propose that the browser be able to designate specific sensitive topics that they can block on behalf of the user along with a capacity for the browser/interface to highlight and call out potentially harmful / discriminatory / sensitive topics. I think there's a high risk here that we should try to more significantly address. I understand that this means essentially defining specific ads as sensitive and marking their similarity to define them as vectors. An additional way to make this work easier for the user is to allow a single user to define an ad as explicitly falling into a sensitive category and have that apply in a wider way, either by informing the site, the browser, or something else. I have another idea that may help with this, that is further down.

  3. I appreciate the progress on sensible defaults. I have an additional concern here, which is that I think expressing any value as default might result in a fingerprinting risk in determining one browser from earlier version browsers? Perhaps we define that as an acceptable risk and move on, but I think it is worth noting here.

  4. I very much like and support the idea of making the API pluggable.

  5. Understanding Ad Topic Hints is an interesting section of this proposal, I think it needs to be extended in the other direction as well. I think it needs to be easy for users to understand the Ad Topic Hints that are applied to them, not just as numbers or lists of ads but as human-readable terms. This is important because ML can easily batch together things in ways that are not understood. I understand that different ML may define the same vectors in different ways, but I worry that without some sort of way to make it clear to the user they may end up misled or taking themselves down an algo-hole that they don't understand or wish to be in, and without understanding which ads led them there, they may have no way to get out. This could be helped somewhat with an interface like "you saw this ad because you approved/engaged with these other ads" but even that may be a lot of labor for a less-informed user.

  6. There is a fraud/clarity issue here that remains unaddressed. A method of ad fraud occurs by initially, or in some contexts, showing one ad and then switching it out after initial load or on specific user conditions, all within the context of the ad frame. In order to avoid misleading users with ads that transform, ads that are stored by the browser or another entity on behalf of the user should either be frozen in some way in the moment they are stored as a measure for safety and clarity. This could mean a screenshot at that moment or blocking further JS on that ad or some other method, but I do think this would be a valuable feature.

7:

I think there is a particular issue here that remains unaddressed but is a serious one if we want to solve the algorithmic discrimination that can present in ML-driven user-targeting systems. Specifically: how do you get to see ads that are specifically targeted in a way that a user is unlikely to ever see them through normal browsing and t/f exclusionary by their lack of presence and opportunity for the user. The big examples here are job and real estate listings. If these ads are targeted to specific vectors and the user is never given the opportunity to acquire those vectors we are replicating the current bad state of algorithmic red-lining. However, I think we can take the lead from some previous work on this in the ad space.

There's an opportunity here to not just lie to the algorithm when given the opportunity to do so on specific ads (as noted above) but to look at a way to emulate behaviors and approvals of a very different ad targeting profile, and this is a clear opportunity derived from the suggestion above that sites be able to store and later push user choices of ads into the browser. I propose that we have explicit methods to suspend, switch, export, and import a user's ad topic hints from within their own browser and to exchange with others or with sites that might choose to request, share and create such "profiles" and make the available to others. This would allow a UX like Firefox's Track This project to work or allow users to lie about their interests more effectively and with less friction.

To be clear, I think that allowing users to lie intentionally and easily about their interests to drive different ad outcomes is and should continue be a goal of this proposal. It would also allow users to easily switch between profiles depending on what they want presented to them at that moment and potentially acquire a profile they might not ever have presented to them due to other targeting factors. Further, to add an additional solution on point 2, it would allow a user or site or browser to define specific categories of disliked ads and ask the user if they would like to accept a list of ads that would allow them to exclude themselves from targeting based on sensitive categories. This might be especially helpful for sites that are focused on sensitive categories, as they can work with their administrators and users in order to define a list of ads that would normally be targeted to them contextually and allow their users to opt out of being targeted on the basis of those ads. I think this makes sense as a further extension of the data-portability discussion above.

@npdoty
Copy link

npdoty commented Jul 22, 2021

My opinion is that it is OK for “Ad Topic Hints” to be saved by websites and associated with PII, so long as we can reach the design goal that these do not enable cross site tracking. This will require thoughtful design, and is critical to the proposal.

Secondly, to the extent that these stated preferences are being shared, this is OK (in my opinion) because expectations should be clearly set with people that their preferences will be shared with websites for the purpose of selecting ads more to their liking. While the API is designed to share these only in approximate form to prevent fingerprinting attacks, the concept that the data is shared is aligned with user expectations.

Sharing data for a particular purpose (to select more relevant ads) is much narrower than accepting that data on ad preferences will be stored by every website, combined with other personally-identifying information and used for other purposes.

We should absolutely consider the threat model of using this data for fingerprinting or re-identification and mitigate, but I also think that if we are designing an API for a very specific use and choosing to expose new and potentially sensitive information for that use, that we should make that use limitation explicit. There may be both technical and non-technical ways to enforce those limitations.

@erik-anderson erik-anderson removed the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Jul 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests