-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of roaring bitmap under sparse cases #319
Comments
Yes. But before we add complexity, we would need some motivating benchmarks showing the the extra work is worth it. |
@lemire This is my test data generator: // the same key will add to the same bitmap
static std::default_random_engine e;
static std::uniform_int_distribution<int64_t> u(start_range, end_range);
// start_range = 0
// end_range = 5000000000
// keys == 1000
for(int k = 0;k < keys;k++) {
// lines == 50000
for(int i = 0;i < lines; ++i) {
std::cout << k << "\t" << u(e) << '\n';
}
} This is the change I made to I test It In my Application
|
Yes. I saw this and it makes sense to me. I'd still want someone to review it independently. |
I proposed a similar concept a while ago for rle but never got around writing the benchmarks. I'd advice trying to use a union for the |
@stdpain would you mind making a PR from your changes? Otherwise, at some unspecified point in the future would you like me to create one (with proper credit)? |
@Oppen OK |
I have a very ugly suggestion but you may be interested in trying it. Since our arrays are always sorted and never* empty we could simply zero-terminate them for the low count case, gaining a few more elements or even eliminating the indirection altogether, not storing the sizes at all. If the first element is zero, then there's a zero, but every other zero signals end of container. |
sounds great. |
More than a month later, but I put that |
I would love the "small bitmap optimization" described in #319 (comment) We do something like this but with a wrapper around the type which is annoying since we have to translate all the calls. The wrapper has a union of a roaring::Roaring (this is C++) and an array of 32bit ints plus a "small count" which we overlay on the empty padding at the end of the roaring_array_t. Since sizeof(roaring_array_t) is 40 bytes and we need at least one byte for the count field, that gives us (40 - 1) / 4 = 9 values we can use to hold 32bit values, before we have to convert from the "small" bitmap to a full bitmap. We've tested this and in our environment, at least, where we can have a wide variety of sizes of bitmap and it really helps performance in our use-cases, even with the one-time overhead of the conversion. I'm sure that if this was supported directly the overhead could be reduced further. What we do currently is only support a simple subset of operations on this special type of bitmap. If we need to do any of the more complex operations, for example if we're not sure the result will fit into the constant size, we convert it to a normal bitmap first. What we're keeping is basically an array bitmap, but without the overhead of the size (compile-time constant) or capacity (same as the size). Maybe an embedded implementation could get some synergy there. |
Would love to see this incorporated. Our use case is : |
@josh08287 Can you contribute a benchmark? See https://github.com/RoaringBitmap/CRoaring/tree/master/microbenchmarks The difficulty with this issue is that people refer to internal use cases, but there is no corresponding benchmark we can refer to. It means that this issue is unlikely to make progress. I would never merge an optimization without a corresponding credible benchmark. It is too easy to think we are optimizing when we are not. |
The PR by @stdpain did not match the current code, and lacked a benchmark to demonstrate the benefits. We need contributions, if only benchmarks, for this issue to make progress. |
I'll start by providing a benchmark for this case |
In a sparse roaringbitmap, the size of array_container may often be 1, 2, 3. But we can actually reuse the container pointer (sizeof(void*)) as an array of uint16_t to avoid extra memory allocation.
eg:
roaring_array_t ra;
// some init code
ra.typecodes[i] = SINGLE_CONTAIN_TYPE + cardinality << 6;
((uint16_t*)ra.containers[i])[1] = value1;
((uint16_t*)ra.containers[i])[2] = value2;
((uint16_t*)ra.containers[i])[3] = value3;
The text was updated successfully, but these errors were encountered: