Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use fast_union_uint16 instead of union_uint16. #452

Merged
merged 1 commit into from
Mar 28, 2023

Conversation

CharlesChen888
Copy link
Contributor

I wonder why fast_union_uint16 was not used in these 2 functions:

array_array_container_inplace_union
array_array_container_lazy_inplace_union

I think union_vector16 is still safe here.

@lemire
Copy link
Member

lemire commented Mar 24, 2023

It is possible. The reason I did not use them is that I was concerned with the safety. It is a good idea to revisit the issue.

@CharlesChen888
Copy link
Contributor Author

CharlesChen888 commented Mar 27, 2023

I read the code of union_vector16, and I think it is safe since it will not overwrite unread data in src array in the use case of those 2 functions.

@lemire
Copy link
Member

lemire commented Mar 27, 2023

I think it is safe since it will not overwrite unread data in src array

The layout is more complex. We don't allocate a new array for the output, so we have one memory region where we have both the input and the output.

We have the following layout...

[ input 2-----------]

[output--- |input 1----]

That is, input 1 and output are in the same memory region, but output is offsetted. At the end of the computation, it is definitively the case that output will overwrite input 1.

If you are doing a textbook computation, then there is no problem because you cannot write value x before you have read it. Fine.

But it gets more complicated once you have a vectorized function.

Note that we can't just say that it looks ok... we have to be sure.

If you think it is safe, can you work out the analysis from what I just wrote?

1 similar comment
@lemire
Copy link
Member

lemire commented Mar 27, 2023

I think it is safe since it will not overwrite unread data in src array

The layout is more complex. We don't allocate a new array for the output, so we have one memory region where we have both the input and the output.

We have the following layout...

[ input 2-----------]

[output--- |input 1----]

That is, input 1 and output are in the same memory region, but output is offsetted. At the end of the computation, it is definitively the case that output will overwrite input 1.

If you are doing a textbook computation, then there is no problem because you cannot write value x before you have read it. Fine.

But it gets more complicated once you have a vectorized function.

Note that we can't just say that it looks ok... we have to be sure.

If you think it is safe, can you work out the analysis from what I just wrote?

@CharlesChen888
Copy link
Contributor Author

CharlesChen888 commented Mar 28, 2023

@lemire
Yes, it is the case that output will overwrite input 1, but the overwritten part has already been read and stored somewhere else.

We just need to focus on the reading and writing performed on array1. In union_vector16, both vectorized and scalar code still obey the basic rule: read from two inputs, do the union, and then write the output.

Let's say the length(cardinality) of input2 is L2:

        |<-  L2  ->|
array1: [output--- |input 1---|---]
array2: [input 2---]

Let's define 3 __m128i pointers, pos1 starts from input1, pos2 starts from input2, these 2 point at the next byte to read, out starts from output, pointing at the next byte to overwrite.

array1: [output--- |input 1---|---]
             ^          ^
            out        pos1
array2: [input 2---]
                ^
               pos2

The union output always contains less or equal number of elements than all inputs added, so we have:

out <= pos1 + pos2

therefore:

out <= pos1 + L2

which means you will not overwrtite data beyond pos1, so the data haven't read is safe, and we don't care the data already read.

This is the basic idea, I don't know how to illustrate this with code details... hope you can consider this and go through the code again.

@lemire
Copy link
Member

lemire commented Mar 28, 2023

Merging.

@lemire lemire merged commit ff08b57 into RoaringBitmap:master Mar 28, 2023
@feng-y feng-y mentioned this pull request May 12, 2023
@lemire
Copy link
Member

lemire commented Sep 25, 2023

I am reverting this PR. It causes failures with Intel compilers.

@lemire lemire mentioned this pull request Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants