Use fast_union_uint16 instead of union_uint16. #452

CharlesChen888 · 2023-03-24T09:56:48Z

I wonder why fast_union_uint16 was not used in these 2 functions:

array_array_container_inplace_union
array_array_container_lazy_inplace_union

I think union_vector16 is still safe here.

lemire · 2023-03-24T17:24:56Z

It is possible. The reason I did not use them is that I was concerned with the safety. It is a good idea to revisit the issue.

CharlesChen888 · 2023-03-27T07:23:12Z

I read the code of union_vector16, and I think it is safe since it will not overwrite unread data in src array in the use case of those 2 functions.

lemire · 2023-03-27T12:14:20Z

I think it is safe since it will not overwrite unread data in src array

The layout is more complex. We don't allocate a new array for the output, so we have one memory region where we have both the input and the output.

We have the following layout...

[ input 2-----------]

[output--- |input 1----]

That is, input 1 and output are in the same memory region, but output is offsetted. At the end of the computation, it is definitively the case that output will overwrite input 1.

If you are doing a textbook computation, then there is no problem because you cannot write value x before you have read it. Fine.

But it gets more complicated once you have a vectorized function.

Note that we can't just say that it looks ok... we have to be sure.

If you think it is safe, can you work out the analysis from what I just wrote?

lemire · 2023-03-27T12:14:41Z

I think it is safe since it will not overwrite unread data in src array

The layout is more complex. We don't allocate a new array for the output, so we have one memory region where we have both the input and the output.

We have the following layout...

[ input 2-----------]

[output--- |input 1----]

That is, input 1 and output are in the same memory region, but output is offsetted. At the end of the computation, it is definitively the case that output will overwrite input 1.

If you are doing a textbook computation, then there is no problem because you cannot write value x before you have read it. Fine.

But it gets more complicated once you have a vectorized function.

Note that we can't just say that it looks ok... we have to be sure.

If you think it is safe, can you work out the analysis from what I just wrote?

CharlesChen888 · 2023-03-28T03:18:50Z

@lemire
Yes, it is the case that output will overwrite input 1, but the overwritten part has already been read and stored somewhere else.

We just need to focus on the reading and writing performed on array1. In union_vector16, both vectorized and scalar code still obey the basic rule: read from two inputs, do the union, and then write the output.

Let's say the length(cardinality) of input2 is L2:

        |<-  L2  ->|
array1: [output--- |input 1---|---]
array2: [input 2---]

Let's define 3 __m128i pointers, pos1 starts from input1, pos2 starts from input2, these 2 point at the next byte to read, out starts from output, pointing at the next byte to overwrite.

array1: [output--- |input 1---|---]
             ^          ^
            out        pos1
array2: [input 2---]
                ^
               pos2

The union output always contains less or equal number of elements than all inputs added, so we have:

out <= pos1 + pos2

therefore:

out <= pos1 + L2

which means you will not overwrtite data beyond pos1, so the data haven't read is safe, and we don't care the data already read.

This is the basic idea, I don't know how to illustrate this with code details... hope you can consider this and go through the code again.

lemire · 2023-03-28T16:33:25Z

Merging.

lemire · 2023-09-25T18:36:25Z

I am reverting this PR. It causes failures with Intel compilers.

Use fast union instead of ordinary union.

3fc74c2

CharlesChen888 force-pushed the improve-union branch from af171fd to 3fc74c2 Compare March 27, 2023 03:04

lemire mentioned this pull request Mar 27, 2023

enabling fast inplace routines (tentative) #455

Closed

lemire merged commit ff08b57 into RoaringBitmap:master Mar 28, 2023

feng-y mentioned this pull request May 12, 2023

[bugs] commit #452 #476

Closed

lemire mentioned this pull request Sep 25, 2023

Reverting PR 452 #509

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use fast_union_uint16 instead of union_uint16. #452

Use fast_union_uint16 instead of union_uint16. #452

CharlesChen888 commented Mar 24, 2023

lemire commented Mar 24, 2023

CharlesChen888 commented Mar 27, 2023 •

edited

Loading

lemire commented Mar 27, 2023

lemire commented Mar 27, 2023

CharlesChen888 commented Mar 28, 2023 •

edited

Loading

lemire commented Mar 28, 2023

lemire commented Sep 25, 2023

Use fast_union_uint16 instead of union_uint16. #452

Use fast_union_uint16 instead of union_uint16. #452

Conversation

CharlesChen888 commented Mar 24, 2023

lemire commented Mar 24, 2023

CharlesChen888 commented Mar 27, 2023 • edited Loading

lemire commented Mar 27, 2023

lemire commented Mar 27, 2023

CharlesChen888 commented Mar 28, 2023 • edited Loading

lemire commented Mar 28, 2023

lemire commented Sep 25, 2023

CharlesChen888 commented Mar 27, 2023 •

edited

Loading

CharlesChen888 commented Mar 28, 2023 •

edited

Loading