-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Privacy Preserving Learning
olgavrou edited this page Jul 12, 2022
·
5 revisions
This feature has been removed in VW version 9.2.0. The pull request introducing this feature can be found here
Build VW by setting the cmake flag BUILD_PRIVACY_ACTIVATION
to ON
(default's to OFF)
The command line argument --privacy_activation
implements aggregated learning by saving only those features that have seen a minimum threshold of users.
- In many real-world scenarios, the recommender cannot use the feature preferences of a user directly for learning due to privacy constraints.
- However, the recommender can learn from aggregated data which would uphold the privacy of the user.
- For each feature, a 32-bit vector is defined.
- We calculate a 5-bit hash of the tag of the example.
- For each feature weight updated by a non-zero value, we use the 5-bit hash to look up a bit in the 32-bit vector and set it to 1.
- When saving the weights into a file, we calculate the number of bits set to 1 for a feature. If it is greater than the threshold, the weights for that feature are saved.
- The default value of the threshold is 10.
- The number of trials for the first occurrence of m bits being flipped out of k bits which is a geometric distribution.
- Therefore, the probability of the next bit being flipped given n bits are already flipped is
(k-n)/k
. - The expectation of a geometric distribution is
1/p
, hence the expected waiting time until m bits are flipped is a summation from 0 to m-1 of the expectationk/(k-n)
. - On calculation, the expected waiting time for flipping 10 bits out of 32 was 11.76, while it was 22.21 for flipping 10 bits out of 11.
- This implied that at least 12 unique users are needed in expectation to flip 10 bits out of 32 bits.
--privacy_activation
: To activate the feature
--privacy_activation_threshold arg (=10)
: To set the threshold
- Implement the feature for save_resume.
- Work on aggregations in the online setting.
- Support for VW Slim
- Thanks to Olga Vrousgou and Pavithra Srinath for their mentorship.
- To further view the results for the empirical analysis of the feature, refer to : Empirical Analysis of Privacy Preserving Learning - RLOSF'21
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: