-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rocBLAS calls do not produce correct results #419
Comments
@dmcdougall : the issue seems to be resolved by setting env var |
I can't reproduce the failure locally: $ cat repro.cpp
#include <cassert>
#include <rocblas/rocblas.h>
#define hipCheck(s) \
do {\
hipError_t err = s;\
if (err != hipSuccess) {\
printf( "Failed to run error %d ", __LINE__);\
return -1;\
}\
} while(0)\
#define rocblasCheck(s) \
do {\
rocblas_status err = s;\
if (err != rocblas_status_success) {\
printf( "Failed to run error %d ", __LINE__);\
return -1;\
}\
} while(0)\
int main(int argc, char ** argv)
{
int N = 1;
size_t size = N * sizeof(double);
double * arg, * result;
hipCheck(hipMallocManaged(&arg, size));
hipCheck(hipMallocManaged(&result, size));
hipCheck(hipMemset(arg, 1, size));
hipCheck(hipMemset(result, 0, size));
hipStream_t stream;
hipCheck(hipStreamCreate(&stream));
rocblas_handle handle;
rocblasCheck(rocblas_create_handle(&handle));
rocblasCheck(rocblas_set_stream(handle, stream));
rocblasCheck(rocblas_dcopy(handle, N, arg, 1, result, 1)); // copy arg into result
hipCheck(hipStreamSynchronize(stream));
assert(result[0] == arg[0]); //fails?
rocblas_destroy_handle(handle);
hipCheck(hipStreamDestroy(stream));
hipCheck(hipFree(arg));
hipCheck(hipFree(result));
return 0;
}
$ hipcc repro.cpp -L/opt/rocm-5.7.0/lib -o repro -lrocblas
$ env | grep HIP
$ env | grep ROCR
$ ./repro
$ echo $?
0
$ rocminfo | grep gfx9
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack-
Name: gfx90a
Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Your observation about setting the device visibility in the environment is interesting. Are you launching your job with slurm with the cgroups plugin enabled? |
Could you also re-run your example with AMD_LOG_LEVEL=3 set in the environment? I want to see if there are any hip runtime calls in your example that aren't present in my example. There will be quite a lot of output to the screen (stderr, I think), so I recommend piping to a file. |
@dmcdougall thanks for investigating ... I invoke the executable directly, so no slurm involved, and Unfortunately my attempts to make the example more representative of the "real" example did not succeed to trigger the problem. In the real app all calls happen in thread pool so I thought maybe some thread-local state was not being initialized properly ... to no avail. For the record, here's the most recent form of the example:
|
Ok, thanks. Can you either:
|
HIP/ROCm support introduced in #418 is only minimally functional at the moment (but already sufficient to provide HIP support in https://github.com/devreal/ttg/tree/ttg-device-support-master-coro-with-stream-tasks) but when trying to use rocBLAS (via ICL's blaspp C++ API) it seems that nothing happens. Here's a simplified version of
examples/device/device_task
:It fails in the assertion. Meanwhile
succeeds.
Note that
result.data()
andarg.data()
point to the unified memory (allocated viahipMallocManaged
). So the only working hypothesis is that rocBLAS does not support operations on data in UM ...The text was updated successfully, but these errors were encountered: