Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect FLoC ID returned #2

Closed
Yash-Vekaria opened this issue Aug 6, 2021 · 11 comments
Closed

Incorrect FLoC ID returned #2

Yash-Vekaria opened this issue Aug 6, 2021 · 11 comments

Comments

@Yash-Vekaria
Copy link

While experimenting with the floc_simulator, I observed that the simulator is dependent on the input characters like "www." or even "/"; whereas the actual floc by google is independent of these variations and always considers eTLD+1.

For example: for websites "https://www.yahoo.com", "https://www.yahoo.com/", "https://yahoo.com/" and "https://yahoo.com" google's FLoC Algorithm returns the same FLoC ID when any variation mentioned above is checked for FLoC ID. This is shown by the output screenshot below:

image

However, when a set of the same websites are checked with the above variations in Shigeki's floc_simulator, it returns a different FLoC ID for each change in variation. The change in FLoC ID with these variations is shown by the following output screenshot for floc_simulator.

image

The FLoC ID is not matching even for the sample inputs that the simulator has shown on its homepage.

Note: Since Shigeki's floc_simulator works only for FLoC Version 2.1. This version has been explicitly set using the browser flag snippet below (i.e., setting finch_config_version) and used while carrying out all of the above-mentioned experiments/tests. Also, all the websites used in experimentation have opted-in for FLoC OT.

FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1/finch_config_version/2,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy

@Yash-Vekaria Yash-Vekaria changed the title Wrong FLoC ID returned Incorrect FLoC ID returned Aug 6, 2021
@shigeki
Copy link
Owner

shigeki commented Aug 8, 2021

This is because the input file of domain_list does not support a URL format (https://foo.bar.com/") but only an FQDN format (foo.bar.com) of the domain. Removing schema or path in the domain list would show the correct cohort ID.

@Yash-Vekaria
Copy link
Author

@shigeki If you observe the above two screenshots (in my original comment), I have done the same.

While running your simulator, in the input file of domain_list I have entered FQDN format of the websites. They are follows:
["yahoo.com", "bbc.com", "cnn.com", "youtube.com", "foxnews.com", "techcrunch.com", "msn.com"]
For this input I have received the cohort ID as "24904" by the simulator.

While URL Format of these same websites when checked for FLoC ID in google's implementation, it returns "16619" as cohort ID. So, there is some issue with the simulator.

Note: While actually visiting websites to verify FLoC ID with google's implementation, we need to visit their URL Format; FQDN Format doesn't work.

@shigeki
Copy link
Owner

shigeki commented Aug 10, 2021

I checked it and I found a bug of CityHashv103 and fixed in 292fe1b. Thanks.
But its cohort Id is 14933, not 16619.

$ ./floc_sample test_domain.json
domain_list: [yahoo.com bbc.com cnn.com techcrunch.com youtube.com foxnews.com msn.com]
sim_hash: 568239095611142
cohortId: 14933

It is the same value as my Chromium below, where annotation flags in its history are removed and the update interval is changed to 5min for this test output. Please check your history is the same one.
image

@Yash-Vekaria
Copy link
Author

@shigeki I set the update interval to 300s (i.e. 5min.) and minimum_history_domain_size_required to 1 and finch_config_version (i.e. floc version) to 1 (i.e. it will consider Chrome 1.1 floc). If you see the history and the FLoC ID that Google returns, it is "16619".

image

The exact flags set are as follows:
--enable-blink-features=InterestCohortAPI
--enable-features=FederatedLearningOfCohorts:update_interval/300s/minimum_history_domain_size_required/1/finch_config_version/1,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy

Additionally, as described on the main page of simulator, the simulator is said to be tested and considered working for FLoC version Chrome 2.1. The SS that you shared is for Chrome 1.1. For the above history, Chrome 2.1 floc version is also returning "16619". It would be very helpful if you can fix the simulator to return "16619".

@shigeki
Copy link
Owner

shigeki commented Aug 11, 2021

Try to add a feature option of FlocPagesWithAdResourcesDefaultIncludedInFlocComputation. Clear history and confirm exception error output to confirm the cohort ID is not fetched from the previous preference log.
Wait for more than 300 sec, then you can get a new cohort ID, which is 14933 for my case. I believe that it is the right answer.

imageThe whole option is
--enable-blink-features=InterestCohortAPI --enable-features=FederatedLearningOfCohorts:update_interval/300s/minimum_history_domain_size_required/1/finch_config_version/1,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy,FlocPagesWithAdResourcesDefaultIncludedInFlocComputation

@Yash-Vekaria
Copy link
Author

@shigeki I visited the websites under discussion and then fetched cohort ID using API, it returned "17745". Then as directed by you, I waited for 300+ seconds and recalled the API and this time got Cohort ID as "14925" and not "14933" (as you got). Also the simulator should return "17745", right? Why does it return the FLoC ID that is refreshed post 300+secs?

Note: This time I tried a slight different combination as mentioned here for the websites visited:
["https://www.yahoo.com/", "https://www.bbc.com/", "https://www.edition.cnn.com/", "https://www.youtube.com/", "https://www.foxnews.com/", "https://www.techcrunch.com/", "https://www.msn.com/"]

image

Following is the simulator output, if that helps:
image

@shigeki
Copy link
Owner

shigeki commented Aug 12, 2021

17745 is the wrong Cohort ID for the list of domains, which was calculated and saved in your preference in the past.
You should clear your all history or settings in order to clear it, then you have an exception from document.interestCohort().

I'm not sure why your chrome returns 14925, not 14933.
Looking at the screenshot, the differences are a data scheme at the first in your history and a language setting. Both do not affect FLoC.

I think that it is a new issue on Chrome, not floc_simulator and I cannot solve it because it cannot be reproducible in my Chrome.
I believe that 14933 is the right cohort ID because both my Chrome and floc_simulator return the same number.

@geeeoff
Copy link

geeeoff commented Aug 23, 2021

@shigeki - I am using Chrome version: Version 92.0.4515.159 (Official Build) (x86_64) and get the same result as @Yash-Vekaria

Any further ideas on what might be causing this?

@shigeki
Copy link
Owner

shigeki commented Aug 24, 2021

We have a different cohort id that comes from 6 URLs. Comparing cohort id with only one URL would give us some hints.
The cohort ids of each one url by floc_simulator is followings.

diff --git a/packages/floc/setup.go b/packages/floc/setup.go
index b60470d..59eab31 100644
--- a/packages/floc/setup.go
+++ b/packages/floc/setup.go
@@ -8,7 +8,7 @@ import (
        "os"
 )

-var kFlocIdMinimumHistoryDomainSizeRequired int = 7
+var kFlocIdMinimumHistoryDomainSizeRequired int = 1

 // cluster data comes from ~/Library/Application\ Support/Google/Chrome\ Canary/Floc/1.0.6/ in MacOS
 var cluster_file = "../../Floc/1.0.6/SortingLshClusters"
$ ./floc_sample yahoo_com.json
domain_list: [yahoo.com]
sim_hash: 340880272222230
cohortId: 8802
$ ./floc_sample bbc_com.json
domain_list: [bbc.com]
sim_hash: 525203699512276
cohortId: 13856
$ ./floc_sample cnn_com.json
domain_list: [cnn.com]
sim_hash: 826482894952494
cohortId: 23217
$ ./floc_sample youtube_com.json
domain_list: [youtube.com]
sim_hash: 986137333389667
cohortId: 28033
$ ./floc_sample techcrunch_com.json
domain_list: [techcrunch.com]
sim_hash: 760961668495438
cohortId: 20842
$ ./floc_sample msn_com.json
domain_list: [msn.com]
sim_hash: 6139457615798
cohortId: 158

Try to run Chrome with the following options with update_interval: 10 sec and minimum history domain size required: 1.
I'm using Chromium 95.0.4620.0 (Developer Build) (x86_64). I think Chrome Canary would show the same results.

--enable-blink-features=InterestCohortAPI --enable-features="FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1,InterestCohortFeaturePolicy,FlocPagesWithAdResourcesDefaultIncludedInFlocComputation"

My chrome shows the same cohort ids as those with floc_simulator as below.

Check your output of each URL. Do not forget to clear your browser history in your Chrome before accessing the URL and wait for more than 10 seconds before to show the cohort id.

yahoo_com_floc

bbc_com_floc

cnn_com_floc

youtube_com_floc

techcrunch_com_floc

msn_com_floc

@geeeoff
Copy link

geeeoff commented Aug 24, 2021

@shigeki - I was able to reproduce the exact same IDs as you shared. I am also able to get cohort ID: 14933. The difference seems to be with the flags used to start up Chrome.

Using: --enable-blink-features=InterestCohortAPI --enable-features="FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1,InterestCohortFeaturePolicy,FlocPagesWithAdResourcesDefaultIncludedInFlocComputation"

Results in: 14933

Using (suggested by Google): --enable-blink-features=InterestCohortAPI --enable-features="FederatedLearningOfCohorts:update_interval/10s/minimum_history_domain_size_required/1,FlocIdSortingLshBasedComputation,InterestCohortFeaturePolicy"

Results in: 16619

Thank you for diving in, I am happy that your simulator now matches my Chrome!

@shigeki
Copy link
Owner

shigeki commented Aug 25, 2021

It is great.
The flag of FlocPagesWithAdResourcesDefaultIncludedInFlocComputation is needed after OT(ver:chrome/2.1) is finished in order to apply visited pages with ad resources in your history to FLoC computation.
Close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants