-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue]: Can not generate TensileLibrary.dat for gfx1100 #831
Comments
btw, I am able to run
|
FYI, I tracked down the issue PyTorch (I use the 6.1 nightly) has its own |
I copied the self built hsaco files into the tensile lib path, and got similar error, but I didn't get core dump. How did you track back to the libhipblaslt.so from torch? |
Problem Description
I am trying to use https://pytorch.org/torchtune/ on gfx1100 (W7900 and 7900 XTX) with ROCm 6.1.2 - this fails because the latest 6.1.2 ROCm distro's hipblaslt-dev6.1.2_0.7.0.60102-119~22.04_amd64 does not include gfx1100 kernels. I have successfully compiled and installed my own version from source (0.8.0-56aab12f~dirty) , however I can't seem to get it to generate a
TensileLibrary.dat
(the closest isTensileLibrary_gfx1100.dat
) and when I symlink that I get this error:Operating System
Ubuntu 22.04.4 LTS (Jammy Jellyfish)
CPU
AMD Ryzen 5 5600G with Radeon Graphics
GPU
AMD Radeon RX 7900 XTX, AMD Radeon Pro W7900
Other
No response
ROCm Version
ROCm 6.1.2
ROCm Component
hipBLASLt
Steps to Reproduce
It's not available as a selection, but I am using the official ROCm 6.1.2 Ubuntu packages.
I am using the current torchtune 0.1.1 package: https://pypi.org/project/torchtune/
The hipblaslt-dev6.1.2 package does not have gfx1100 kernels:
and so dies with a
HIPBLAS_STATUS_NOT_SUPPORTED
error.I've compiled my new version roughtly as the docs suggest:
Once installed I get
Running torchtune with this installed gives me this error:
Since the location is hard-coded I symlink the TensileLibrary_gfx1100.dat to TensileLibrary.dat which gives me this error:
At this point I'm pretty stumped. torchtune is 100% PyTorch and the 7900 XTX and W7900 should have full support so I'm not sure if what I'm encountering is a configuration error or an implementation bug?
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
Additional Information
No response
The text was updated successfully, but these errors were encountered: