Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix wrong device_id computation in identifier_from_devid #4694

Merged
merged 5 commits into from
Jan 29, 2022

Conversation

janciesko
Copy link
Contributor

Fixes: #4455

@crtrott crtrott added the Blocks Promotion Overview issue for release-blocking bugs label Jan 19, 2022
@janciesko janciesko requested a review from rgayatri23 January 19, 2022 18:50
Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know this is less wrong than the code that was there before?

@janciesko
Copy link
Contributor Author

I'll add a code comment that explains the bit range assignments.

@janciesko janciesko force-pushed the fixDeviceIDCompute3 branch 2 times, most recently from 2576dc7 to 266e51e Compare January 19, 2022 19:25
@janciesko
Copy link
Contributor Author

#include <stdio.h>

struct ExecutionSpaceIdentifier {
  uint32_t type;
  uint32_t device_id;
  uint32_t instance_id;
};
// Type your code here, or load an example.
ExecutionSpaceIdentifier one(const uint32_t in) {
      return {(in >> 24),  // first 8 bits
          ((in & 0x00FFFFFF) >> 17),  // next 7 bits
           (in & 0x0001FFFF)}; // last 17 bits
}

// Type your code here, or load an example.
ExecutionSpaceIdentifier two(const uint32_t in) {
return {(in >> 24),
          (~((uint32_t(-1)) << 7)) & (in >> 17),
          (~((uint32_t(-1)) << 17)) & in};
}

int main(int c, char ** args)
{
ExecutionSpaceIdentifier a = one(0x12342334);
ExecutionSpaceIdentifier b = two(0x12342334);
printf("%i, %i, %i, %i, %i, %i\n", a.type, a.device_id, a.instance_id, b.type, b.device_id, b.instance_id);
return 1;
}

@dalg24 This shows equivalence.

@janciesko janciesko requested a review from dalg24 January 19, 2022 19:30
@janciesko
Copy link
Contributor Author

retest this please

@janciesko janciesko force-pushed the fixDeviceIDCompute3 branch from 266e51e to 71ac133 Compare January 25, 2022 17:04
@janciesko janciesko force-pushed the fixDeviceIDCompute3 branch from 1860280 to 54c07ed Compare January 26, 2022 00:26
@janciesko
Copy link
Contributor Author

Added unit test coverage

Copy link
Contributor

@cz4rs cz4rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (only minor nitpicking in the comments).

core/src/impl/Kokkos_Profiling_Interface.hpp Outdated Show resolved Hide resolved
core/src/impl/Kokkos_Profiling_Interface.hpp Outdated Show resolved Hide resolved
@janciesko janciesko force-pushed the fixDeviceIDCompute3 branch from d3efcd0 to d9170c0 Compare January 29, 2022 03:07
@crtrott crtrott merged commit c702475 into kokkos:develop Jan 29, 2022
@khuck
Copy link
Contributor

khuck commented Mar 10, 2022

@janciesko I am working with the current development branch of Kokkos and the ROCm 5.0.2 compiler... and for my HIP kernels, I am now getting the wrong device type and id. Instead of getting HIP and device 0, I am getting Serial and device 3. I have updated my code to match what is in the PR test example, but I'm still getting the wrong info. Did something else change?

@khuck
Copy link
Contributor

khuck commented Mar 10, 2022

btw, I know that Kokkos git hash e5a8acc gives what I am expecting...

@janciesko
Copy link
Contributor Author

@Rombur - do you maybe know what might have changed in the HIP backend in respect to obtaining device Id and device type?

@Rombur
Copy link
Member

Rombur commented Mar 11, 2022

No I don't know anything that could have change that

@khuck
Copy link
Contributor

khuck commented Apr 13, 2022

@janciesko @Rombur With the current master, I am running HIP kernels on device 0. The "device ID" passed to our profiler has the value 393217. That is being translated (by the above code) as a Serial kernel (0), on device 3, instance 1. That is incorrect...is the device getting constructed differently by Kokkos, or being interpreted incorrectly?

393217 is:

0000 0000 0000 0110 0000 0000 0000 0001
| type  | |device||      instance     |

The value should be:
50331649:

0000 0011 0000 0000 0000 0000 0000 0001
| type  | |device||      instance     |

...right? This has changed since the last Kokkos release (3.5).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants