Privacy Preserving Learning

Note

This feature has been removed in VW version 9.2.0. The pull request introducing this feature can be found here

To enable

Build VW by setting the cmake flag BUILD_PRIVACY_ACTIVATION to ON (default's to OFF)

The command line argument --privacy_activation implements aggregated learning by saving only those features that have seen a minimum threshold of users.

Motivation:

In many real-world scenarios, the recommender cannot use the feature preferences of a user directly for learning due to privacy constraints.
However, the recommender can learn from aggregated data which would uphold the privacy of the user.

Methodology:

For each feature, a 32-bit vector is defined.
We calculate a 5-bit hash of the tag of the example.
For each feature weight updated by a non-zero value, we use the 5-bit hash to look up a bit in the 32-bit vector and set it to 1.
When saving the weights into a file, we calculate the number of bits set to 1 for a feature. If it is greater than the threshold, the weights for that feature are saved.

Threshold:

The default value of the threshold is 10.
The number of trials for the first occurrence of m bits being flipped out of k bits which is a geometric distribution.
Therefore, the probability of the next bit being flipped given n bits are already flipped is (k-n)/k.
The expectation of a geometric distribution is 1/p, hence the expected waiting time until m bits are flipped is a summation from 0 to m-1 of the expectation k/(k-n).
On calculation, the expected waiting time for flipping 10 bits out of 32 was 11.76, while it was 22.21 for flipping 10 bits out of 11.
This implied that at least 12 unique users are needed in expectation to flip 10 bits out of 32 bits.

Implementation:

--privacy_activation : To activate the feature

--privacy_activation_threshold arg (=10) : To set the threshold

Future Work:

Implement the feature for save_resume.
Work on aggregations in the online setting.
Support for VW Slim

Credits:

Thanks to Olga Vrousgou and Pavithra Srinath for their mentorship.
To further view the results for the empirical analysis of the feature, refer to : Empirical Analysis of Privacy Preserving Learning - RLOSF'21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly