-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New iterator released: shuffled #28
base: master
Are you sure you want to change the base?
Conversation
Typo in README code example - sorted not replaced with shuffled. |
I like the idea but the implementation seems pretty intense and not the most obvious so I'm having a bit of trouble figuring out why. What are the advantages of all this over creating a sequence of iterators into the |
It's zero-cost by memory. Every step is just some simple asm operations (mask, shift). If you use std::shuffle - you must have enough memory to store all shuffled set. In the presented case we can shuffle, for example, very big files line-by-line, or rows in big SQL tables (...if we don't care about speed of random seeking through it. If we do - we can use Mixed Product - it's sequential). Off course if all shuffled data already are in the memory - we have random-access iterators (through dumb_advance), so shuffling will be as fast as std::shuffle. |
Well that does sound pretty cool. Can I see the IPv4 shuffling code? |
In this test I use IPv4 pseudo container and shuffle through it: rators project will be overwritten to use cppitertools::suffled and cppitertools::mixed_product soon |
1. There was a bug. When we approximate, for example 10 with power of two - we got 16 (2^4). So register size must be 4. But instead register if size 5 was used. It's not efficient. 2. Even we have efficient std::distance and operator+(int n) functions dumb_advanced and dumb_size doesn't uses them, so they was replaced with std::advance and std::distance accordingly
IPv4 shuffling with help of cppitertools and https://github.com/hoxnox/iptools with zero memory cost: for (auto i : iter::shuffled(cidr_v4("0.0.0.0/0")))
void(0);
|
I'm sorry this is taking me so long to get through. I'm working on just language stuff for the most part at this time. You have a point in your test file to make sure "shuffled not store container inside" but it needs to in the case of an rvalue
Why did you make a point of this? I've made the change on my checkout on the branch but I want to make sure I'm not missing something. Why do we need the check at the top of What is being gained by having |
|
|
1. Store the shuffled container internally 2. Removed unnecessary checking in operator++ 3. Distance is always uint64_t
You convinced me. =) |
I tested the code in 2 projects. It successfully runs in production under Gentoo-amd64, Debian-jessie-x64 and Debian-jessie-i386. |
Allows iteration over a sequence in shuffled order. Randomization released through Linear Feedback Shift Register.
Additional convinient feature - ability to restore iterator state with zero cost (not present in README - see tests).