-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible degradation to 0 in random math #19
Comments
I like the proposed solution but |
It was just an example of how to generate two different random numbers, I didn't mean to use it as it is. Since this code is only used for generation, straight-forward approach "generate src1, src2 randomly, repeat if they are the same" will also work. |
I like this solution because it is nicely optimized by a compiler and does not have a branch. Another proposed one was: unsigned r;
do {
r = rand();
}
while ((r & 31) == ((r>>16) & 31));
return {r & 31, (r>>16) & 31}; I don't like it though because of the loop used. We will have to find test cases where the loop is executed more than once and include them in the test vectors. |
If kiss99 passes TestU01 as said in documentation, non-uniformness is not really a problem. The bias will be very small, something like P.S. First 128 combinations will be chosen with probability 4329605/2^32, the rest 864 combinations will be chosen with probability 4329604/2^32. Not a big deal. |
This is a good catch, thought not a significant issue. First the OpenCL/CUDA compilers are extremely good and definitely do the transforms described above. They definitely do things like min(a,a)=a or popc(a)+popc(a)=popc(a)<<1. The only way an ASIC would have an advantage would be if it had a dedicated squaring unit. However it would only be used for 1/32*1/11=0.3% of random math ops, so doesn't seem worth the area. Second the case of a^a=0 (or a|b=0xffffffff) is not a problem. The result of math() directly feeds a merge(), which was specifically designed to handle the B input being low entropy by XOR or ADDing in the B input. The mix state will not catastrophically collapse to 0 as in the above example. That said there's no reason not to fix this. You suggestion for generating src1/src2 from a single random call seems reasonable. I agree that a 1/2^32 bias doesn't make any meaningful difference. At the same time we should also update the rotate amount in merge() from ((r >> 16) % 32) to (((r >> 16) % 31)+1) to prevent rotating by 0 which is a NOP. |
V0.9.2 fixes this both src1==src2 and rotate by 0. Thanks for this suggestion. |
If
src1 == src2
and we do XOR, result will be 0. This 0 will most likely spread becauseThe fix is to never do math operations that cancel out both arguments to 0. As far as I can see, it's only XOR currently. ASIC can add optimizations for the case when one of the numbers is 0.
Moreover, the case when
src1 == src2
allows many other optimizations for ASIC that OpenCL/GPU won't do. It can use squarer instead of full multiplier for multiplication (more energy efficient), MIN/AND/OR simply become NOP, ADD becomes SHL by 1, CLZ/POPCOUNT become 2 times simpler/energy efficient. OpenCL compiler, on the other hand, is not guaranteed to take advantage of this. Compiler will be able to remove MIN/AND/OR from the generated code if src1 == src2, but it's unlikely to do more.Again, the fix for all this is simple: never do math on the same register, always use two different registers:
The text was updated successfully, but these errors were encountered: