-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Speed up random poly generation #742
base: main
Are you sure you want to change the base?
Conversation
As noted in privacy-scaling-explorations#151 the generation of a random poly for degrees bigger than 20 starts to get quite slow. This PR tries to include some minimal changes in the `commit` fn so that we upstream the improvements achieved in PSE/halo2
95055a4
to
a4cc9f6
Compare
let mut random_poly = domain.empty_coeff(); | ||
for coeff in random_poly.iter_mut() { | ||
*coeff = C::Scalar::random(&mut rng); | ||
} | ||
#[cfg(feature = "multicore")] | ||
let random_poly = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than expanding this inline here, we should just add an EvaluationDomain::random_coeff<R: RngCore>(&self, rng: R)
method. This would also avoid the need to make EvaluationDomain.k
pub(crate)
.
#[cfg(feature = "multicore")] | ||
let random_poly = { | ||
let n_threads = current_num_threads(); | ||
let n = 1usize << domain.k as usize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving this to EvaluationDomain::random_coeff
would also mean we can use self.n
instead of recomputing it here:
let n = 1usize << domain.k as usize; |
let random_poly = { | ||
let n_threads = current_num_threads(); | ||
let n = 1usize << domain.k as usize; | ||
let n_chunks = n_threads + if n % n_threads != 0 { 1 } else { 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let n_chunks = n_threads + if n % n_threads != 0 { 1 } else { 0 }; | |
let chunk_size = (self.n + n_threads - 1) / n_threads; | |
let num_chunks = (self.n + chunk_size - 1) / chunk_size; |
let n_chunks = n_threads + if n % n_threads != 0 { 1 } else { 0 }; | ||
let mut rand_vec = vec![C::Scalar::ZERO; n]; | ||
|
||
let mut thread_seeds: Vec<ChaCha20Rng> = (0..n_chunks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut thread_seeds: Vec<ChaCha20Rng> = (0..n_chunks) | |
let mut thread_seeds: Vec<ChaCha20Rng> = (0..num_chunks) |
|
||
thread_seeds | ||
.par_iter_mut() | ||
.zip_eq(rand_vec.par_chunks_mut(n / n_threads)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.zip_eq(rand_vec.par_chunks_mut(n / n_threads)) | |
.zip_eq(rand_vec.par_chunks_mut(chunk_size)) |
The benchmarks in the description don't look like they are an overall improvement except for Blinder_poly/parallel/22 ? |
.map(|_| { | ||
let mut seed = [0u8; 32]; | ||
rng.fill_bytes(&mut seed); | ||
ChaCha20Rng::from_seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this could safely use ChaCha12Rng
while conservatively meeting our security requirements. The paper Too Much Crypto argues in favour of ChaCha8; I do not entirely endorse all of the conclusions of that paper, but there is no significant doubt in my mind that 12 rounds is sufficient. There is a more recent (Nov 2021) paper on cryptanalysis of ChaCha, PNB-focused Differential Cryptanalysis of ChaCha Stream Cipher, published since the ones considered in Too Much Crypto (which is originally from Dec 2019), but the newer paper does not improve on brute force and in any case would only extend to 7.25 rounds if it did.
Given that there are still unaddressed review comments, and that this is a performance improvement that doesn't affect the public crate API, I propose bumping this out of 0.3.0 (we can cut a point release like 0.3.1 when it is ready). |
As noted in
privacy-scaling-explorations#151 the generation of a random poly for degrees bigger than 20 starts to get quite slow.
This PR tries to include some minimal changes in the
commit
fn so that we upstream the improvements achieved in PSE/halo2Here I forward the benches that I implemented in the PSE PR:
Also leaving CPU info in case is useful: