Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update flip-type operations of Roaring64Map #402

Merged
merged 7 commits into from
Nov 15, 2022

Conversation

kosak
Copy link
Contributor

@kosak kosak commented Nov 10, 2022

Executive summary:

  1. Provided extra entry points for flip on both the 32-bit and 64-bit side to make it uniform with addRange and removeRange
  2. Inspired by an idea from @Dr-Emann in Improve add-type operations #397 , used emplace_hint to minimize map lookups. The resultant code does only one lookup for the whole operation; everything else is just linear iteration through the specified map range, which is cool.
  3. Also flip() does the same "remove bitmaps if they become empty" logic that the other operations do now.
  4. Added unit tests

I like how this one turned out. Kindly let me know what you think.

Some rationale on adding the entry points:

First, provides API uniformity for the end-user, which is always nice.

Second, provides uniformity for the code reader. The style of all of these is to have a couple of "easy" methods and then a longer one that does all the work. But here, the workhorse methods are addRangeClosed(uint64_t, uint64_t) and removeRangeClosed(uint64_t uint64_t) which work with closed intervals, but then we have flip(uint64_t, uint64_t) working with half-open intervals. When the convention changes like that, it's a little harder to analyze.

Also, flip() had a couple of logic errors:

  1. Missed a 'setCopyOnWrite' case when start_high == end_high
  2. When given a half-open interval right at the end of a bitmap, e.g. [0xFFFFFFFF, 0x100000000), the code should see this as falling into a single inner Roaring but it doesn't. This is a (micro) performance problem, not a correctness problem.

I feel rewriting flip() in a style coherent with all the others hopefully makes it easier to understand and use. Kindly let me know what you think.

@lemire
Copy link
Member

lemire commented Nov 11, 2022

@SLieve Would you review ?

@SLieve
Copy link
Contributor

SLieve commented Nov 11, 2022

Sure!

Copy link
Contributor

@SLieve SLieve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with adding flipClosed to the public api, @lemire what do you think?

// Since min and max are uint32_t, highbytes(min or max) == 0. The inner
// bitmap we are looking for, if it exists, will be at the first slot of
// 'roarings'.
if (iter == roarings.end() || iter->first != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. If I have an empty Roaring64Map and I call flipClosed(0, 10), I would expect the range from 0 to 10 to be set. Let's add a test for this case too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrrgh. Great catch!

cpp/roaring64map.hh Outdated Show resolved Hide resolved
cpp/roaring64map.hh Outdated Show resolved Hide resolved
roarings[start_high].setCopyOnWrite(copyOnWrite);
// 2. Flip intermediate bitmaps completely...
for (uint32_t i = 0; i != num_intermediate_bitmaps; ++i) {
auto &bitmap = start_iter->second;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Could we add a small comment here which mentions why we can assume all bitmaps in the range exist? Something like:

// We can directly use the iterator for the entire range, because we made sure it was populated above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment above the call to ensureRangePopulated(), making it consistent with the corresponding comment in addRangeClosed() pending in #397. Let me know if you think this is good enough.

tests/cpp_unit.cpp Outdated Show resolved Hide resolved
Comment on lines 1402 to 1410
// For example (assuming num_slots_to_test = 5), we:
// create a Roaring64Map, (do nothing), flip 5 slots, and check
// Then we:
// create a Roaring64Map, set a bit in slot 0, flip 5 slots, and check
// Then we:
// create a Roaring64Map, set a bit in slot 1, flip 5 slots, and check
// Then we:
// create a Roaring64Map, set a bit in slots 0 and 1, flip 5 slots, and check
// etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: Same comment as in #397, this may read better as a numbered list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redid the comment to look like the (redone) comment in #397. Please take another look.

cpp/roaring64map.hh Show resolved Hide resolved
@SLieve
Copy link
Contributor

SLieve commented Nov 13, 2022

@kosak I think you forgot to push.

@kosak
Copy link
Contributor Author

kosak commented Nov 13, 2022

Sorry about that. I didn't actually "forget" per se but I clicked through the comments and then I thought I could finish up the code before you got here. I will make sure to do it in the other order next time 😃 Please take another look.

…alls

completely into slot 0, delegate to 32-bit flipClosed() rather than
64-bit flipClosed().
Copy link
Contributor

@SLieve SLieve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! @lemire for feedback on adding flipClosed to the API (see above).

@kosak
Copy link
Contributor Author

kosak commented Nov 13, 2022

Sorry, I added one more thing (see 7662b3a )

The rationale here is to put half-open flip() on equal footing with closed-interval flipClosed(uint32, uint32). Meaning they both delegate to a slightly faster version if they end up operating on slot 0.

Let me know if you think this is going too far. Another consistent view is that neither should be a special case, and both should just delegate to the "workhorse" method void flipClosed(uint64_t min, uint64_t max). In other words:

    void flip(uint64_t min, uint64_t max) {
        if (min >= max) {
            return;
        }
        flipClosed(min, max - 1);
    }

    void flipClosed(uint32_t min, uint32_t max) {
      flipClosed(uint64_t(min), uint64_t(max));
    }

Put another way, I'm arguing that for consistency's sake they should either both have this special-case optimization, or neither should. My opinion is they both should.

@SLieve
Copy link
Contributor

SLieve commented Nov 14, 2022

I can see the symmetry argument, but on the other hand it's also a bit more code, and with flipClosed we have a promise about the input, which means that the two methods are not quite equivalent IMO. I don't feel strongly about this, I think it's readable and understandable with or without this additional change.

@kosak
Copy link
Contributor Author

kosak commented Nov 14, 2022

Yeah, I see your point. Let's revert that change. (done)

@lemire
Copy link
Member

lemire commented Nov 15, 2022

flipClosed is fine as far as I am concerned.

@kosak
Copy link
Contributor Author

kosak commented Nov 15, 2022

Added another commit to fix a logic typo.

@lemire
Copy link
Member

lemire commented Nov 15, 2022

@SLieve Do you recommend merging this?

@lemire
Copy link
Member

lemire commented Nov 15, 2022

Merging.

@lemire lemire merged commit ff8aca1 into RoaringBitmap:master Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants