-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Usage: Add memory transferred between Kokkos Mem Spaces #272
base: develop
Are you sure you want to change the base?
Conversation
Here is output from the stream.cuda in the Kokkos-core benchmarks directory run on Perlmutter on 1 GPU. The number of bytes in the HighWater at the last point in time shown in the first table is equal to the data transferred between host and device.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just put it into a separate file otherwise this is good.
I have updated with the requested changes. I did a quick check on my mac laptop and it works on my laptop fine with the change to have a separate file, and for reference, here is the output:
(As can be seen, there are now two files rather than one.) Right now, there seems to be blocker on the CI - I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now. The problem is arising from Kokkos_Profiling.hpp and is coming from int_for_synchronization_reason(), as can be seen from the logs. I don't think I changed anything in this PR that would impact that? This is a CI check from when it was working previously: |
This is fixed. |
Have you rebased on top of |
Thanks for checking this. As mentioned in the last sentence of my previous comment (#272 (comment)), I did rebase on top of Actually, I think this has to do with Kokkos Tools Issue #275, where there is ultimately an incompatibility with of Kokkos Tools with the current Kokkos version (#275 (comment)). This issue came up in early October, which was after the previous successful run I linked. |
This PR adds memory transferred between Kokkos Mem Spaces to the Kokkos Tools memory-usage tool library. This is done based on a request in Kokkos Tools Github Issue #50.
This PR adds functionality to the tool so that the size of data transferred in the deep_copy in a Kokkos application program is accumulated during the execution of a Kokkos program. The deep_copy accumulation is done per Kokkos Memory Space dst->src pair (e.g., OpenMP on host CUDA on device). Note that the "dst" and "src" Kokkos memory space being the same means that there was a deep_copy in the same memory space.
When Kokkos is finalized in a Kokkos application program, the finalize callback prints out the deep_copy accumulations per memory space to the file.