Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Usage: Add memory transferred between Kokkos Mem Spaces #272

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

vlkale
Copy link
Contributor

@vlkale vlkale commented Sep 19, 2024

This PR adds memory transferred between Kokkos Mem Spaces to the Kokkos Tools memory-usage tool library. This is done based on a request in Kokkos Tools Github Issue #50.

This PR adds functionality to the tool so that the size of data transferred in the deep_copy in a Kokkos application program is accumulated during the execution of a Kokkos program. The deep_copy accumulation is done per Kokkos Memory Space dst->src pair (e.g., OpenMP on host CUDA on device). Note that the "dst" and "src" Kokkos memory space being the same means that there was a deep_copy in the same memory space.

When Kokkos is finalized in a Kokkos application program, the finalize callback prints out the deep_copy accumulations per memory space to the file.

@vlkale vlkale self-assigned this Sep 19, 2024
@vlkale
Copy link
Contributor Author

vlkale commented Sep 19, 2024

Here is output from the stream.cuda in the Kokkos-core benchmarks directory run on Perlmutter on 1 GPU. The number of bytes in the HighWater at the last point in time shown in the first table is equal to the data transferred between host and device.

-----@nid----:~/kks-clean/benchmarks/stream> cat nid001213-1948341-Cuda.memspace_usage
# Space Cuda
# Time(s)  Size(MB)   HighWater(MB)   HighWater-Process(MB)
0.018076 7629.4 7629.4 142.7
0.047229 15258.8 15258.8 143.5
0.071899 22888.2 22888.2 144.0
12.252089 15258.8 22888.2 23032.6
12.313410 7629.4 22888.2 23032.6
12.331277 0.0 22888.2 23032.6
# Data transferred between Kokkos Memory Spaces --- 
# DstSpace     SrcSpace    Data-Transferred(MB)
Cuda Cuda 0.0 
# DstSpace     SrcSpace    Data-Transferred(MB)
Cuda Host 22888.2 
# DstSpace     SrcSpace    Data-Transferred(MB)
Host Cuda 22888.2 
# DstSpace     SrcSpace    Data-Transferred(MB)
Host Host 0.0 

------@nid----:~/kks-clean/benchmarks/stream> cat nid001213-1948341-
nid001213-1948341-Cuda.memspace_usage  nid001213-1948341-Host.memspace_usage  
vkale3@nid001213:~/kks-clean/benchmarks/stream> cat nid001213-1948341-Host.memspace_usage 
# Space Host
# Time(s)  Size(MB)   HighWater(MB)   HighWater-Process(MB)
0.077138 7629.4 7629.4 144.0
0.768111 15258.8 15258.8 7773.6
1.455819 22888.2 22888.2 15403.0
12.218183 15258.8 22888.2 23032.6
12.229001 7629.4 22888.2 23032.6
12.240329 0.0 22888.2 23032.6
# Data transferred between Kokkos Memory Spaces --- 
# DstSpace     SrcSpace    Data-Transferred(MB)
Cuda Cuda 0.0 
# DstSpace     SrcSpace    Data-Transferred(MB)
Cuda Host 22888.2 
# DstSpace     SrcSpace    Data-Transferred(MB)
Host Cuda 22888.2 
# DstSpace     SrcSpace    Data-Transferred(MB)
Host Host 0.0 

@vlkale vlkale marked this pull request as ready for review September 19, 2024 23:17
Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put it into a separate file otherwise this is good.

profiling/memory-usage/kp_memory_usage.cpp Outdated Show resolved Hide resolved
profiling/memory-usage/kp_memory_usage.cpp Outdated Show resolved Hide resolved
@vlkale
Copy link
Contributor Author

vlkale commented Dec 5, 2024

I have updated with the requested changes. I did a quick check on my mac laptop and it works on my laptop fine with the change to have a separate file, and for reference, here is the output:

vklap % more /kks/benchmarks/stream/s1ca-13343-Host.memspace_usage 
# Space Host
# Time(s)  Size(MB)   HighWater(MB)   HighWater-Process(MB)
0.000067 762.9 762.9 1856.0
0.084233 1525.9 1525.9 783152.0
0.170814 2288.8 2288.8 1564448.0
3.166797 1525.9 2288.8 2345824.0
3.180026 762.9 2288.8 2345824.0
3.194930 0.0 2288.8 2345824.0

vklap % more /kks/benchmarks/stream/s1ca-13343-Host.memspace_transfers
# Dst Mem Space     Src Mem Space    Total-Data-Transferred(MB)
Host Host 4577.6 

(As can be seen, there are now two files rather than one.)

Right now, there seems to be blocker on the CI - I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now. The problem is arising from Kokkos_Profiling.hpp and is coming from int_for_synchronization_reason(), as can be seen from the logs. I don't think I changed anything in this PR that would impact that? This is a CI check from when it was working previously:
https://github.com/kokkos/kokkos-tools/actions/runs/10948763633/job/30400520719. The PR is built on top of the latest develop.

@vlkale
Copy link
Contributor Author

vlkale commented Dec 5, 2024

Just put it into a separate file otherwise this is good.

This is fixed.

@masterleinad
Copy link
Contributor

I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now

Have you rebased on top of develop? That might fix it.

@vlkale
Copy link
Contributor Author

vlkale commented Dec 5, 2024

I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now

Have you rebased on top of develop? That might fix it.

Thanks for checking this. As mentioned in the last sentence of my previous comment (#272 (comment)), I did rebase on top of develop of Kokkos Tools.

Actually, I think this has to do with Kokkos Tools Issue #275, where there is ultimately an incompatibility with of Kokkos Tools with the current Kokkos version (#275 (comment)). This issue came up in early October, which was after the previous successful run I linked.
I will see if there are quick fixes from that, but given that #275 impacts other PRs, I will advocate for that github issue to be a priority to fix for Kokkos Tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants