Count method in static_multimap returning 0 #421
-
Hello, I was trying to utilize a static multimap and get the count for various keys using device references to the static multimap. I was consistently getting that the template <uint32_t cg_size, typename InsertView>
__global__ void map_filler(InsertView insert_view, int num_items) {
auto tile = cooperative_groups::tiled_partition<cg_size>(cooperative_groups::this_thread_block());
int tid = threadIdx.x + blockIdx.x * blockDim.x;
while(tid < num_items) {
insert_view.insert(tile, {tid, tid});
tid += gridDim.x * blockDim.x;
}
}
template <uint32_t cg_size, typename CountView>
__global__ void count_checker(CountView count_view, int num_items) {
auto tile = cooperative_groups::tiled_partition<cg_size>(cooperative_groups::this_thread_block());
int tid = threadIdx.x + blockIdx.x * blockDim.x;
while(tid < num_items) {
if(count_view.contains(tile, tid) && count_view.count(tile, tid) == 0) {
printf("Tid %d exists in map but has count 0\n", tid);
}
tid += gridDim.x * blockDim.x;
}
}
int main(int argc, char** argv) {
high_resolution_clock::time_point start;
high_resolution_clock::time_point end;
duration<double, std::milli> duration_ms;
int num_items = 6001215; int threads_per_block = 1024;
std::size_t capacity = 2 * num_items;
cuco::static_multimap<int, int> test_map{capacity, cuco::empty_key{-1}, cuco::empty_value{-1}};
auto constexpr cg_size = test_map.cg_size();
// Insert into the map
start = high_resolution_clock::now();
int num_blocks = (num_items + threads_per_block - 1)/(10 * threads_per_block);
map_filler<cg_size><<<num_blocks, threads_per_block>>>(test_map.get_device_mutable_view(), num_items);
cudaDeviceSynchronize();
end = high_resolution_clock::now();
duration_ms = std::chrono::duration_cast<duration<double, std::milli>>(end - start);
std::cout << "Insert time of " << duration_ms.count() << " ms" << std::endl;
// Query the map
start = high_resolution_clock::now();
count_checker<cg_size><<<num_blocks, threads_per_block>>>(test_map.get_device_view(), num_items);
cudaDeviceSynchronize();
end = high_resolution_clock::now();
duration_ms = std::chrono::duration_cast<duration<double, std::milli>>(end - start);
std::cout << "Count time of " << duration_ms.count() << " ms" << std::endl;
} Running this programs outputs:
This indicates that when using the reference returned by |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
@sarda-devesh Thanks for reporting the issue. I didn't find anything suspicious with a glance at your sample code. Let me test it locally and get back to you later. |
Beta Was this translation helpful? Give feedback.
-
I think the behavior you're seeing here arises from mixin CG and per-thread computation. The docs of the CG |
Beta Was this translation helpful? Give feedback.
-
First of all, thank you so much for your quick and helpful response. @PointKernel The example that you linked was really helpful. I just want to confirm my understanding of something: If I define my static_multimap as such:
then the Once again, thank you so much for your help. |
Beta Was this translation helpful? Give feedback.
OK, so there are two issues in the code:
count
function returns a per-thread count thus it's users' responsibility to sum up all threads within a CG to get the final per-CG count.cg_size == 8
(default value for multimap), we are supposed to use the first tile (thread 0, 1, 2, ... 7) to insert the first pair{0, 0}
and the second tile (thread 8, 9, 10, ... 15) to insert{1, 1}
as opposed to thread 0 of tile 0 inserting{0, 0}
and thread 1 of tile 0 inserting{1, 1}
. That's an undefined behavior.As e…