-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"cublasLt ran into an error" with older GPU in 8-bit mode #379
Comments
I got the same error. I use 2 GPU and trying to run pygmalion-2.7b with 8bit. I use Windows. My start-webui.bat file: I made also that: https://www.reddit.com/r/PygmalionAI/comments/1115gom/running_pygmalion_6b_with_8gb_of_vram/ |
This also happens to me on a GTX 1650 GPU. |
I think 8bit in bitsandbytes requires Turing(20xx) or later: https://github.com/TimDettmers/bitsandbytes#requirements--installation
|
On older GPU it will NEVER work with int8 threshold at 6. But I get NaN error and not this error on my P6000. I am using the pre "fixed" bits and bytes that never completed the "cuda setup" part. I'll try it with the new bits and bytes that I don't have to patch and see if I get this error instead. But best believe that it is possible. |
So i have been having this error too. My setup is: I got it to generate by setting Doesnt help those that have single GPUs, but its a start i hope. |
I am on Windows 11 and I am able to load the LLama 7b model in 4bit on my GTX 1060 6GB using the 'allarch' 0.37.0 bitsandbytes from this repo - https://github.com/james-things/bitsandbytes-prebuilt-all_arch. I thought it would be working natively on Linux since the author of bitsandbytes made the int8 function backward compatible so that even Pascal cards can run it. Perhaps you need to compile the I'm sure there is a solution to this 110%. My card is older than yours and 4bit is working fine on it. See if the instructions here - https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_install_llama_8bit_and_4bit/ help you? I was finally able to get 4bit working after following the instructions here. |
I compiled from source for bitsandbytes as well as trying the pip package, just to avoid the issue of the .so |
I think your issue might be something related to an improper installation because from what I understand these 8bit issues are only on older GPUs from 1xxx series and lower. Your 2080 Super and 3060ti are perfectly compatible even with the native int8 function from bitsandbytes, you shouldn't have any need to even compile from source... Perhaps try running in 16 bit.. You have 16GB VRAM which should be more than enough. |
@lolxdmainkaisemaanlu thank you, can you tell where should I put this and do I have to change code about this webui? |
Put it here, but still there is the same bug for me. |
Did you try to run it in 8bit? Do you have error then or not? |
I come here to tell you that the new accepted transformers is slow for me and I have no clue what is wrong on your cards and why mine works. I patch models.py like this: https://pastebin.com/siPxZvkc And then I can generate away: https://pastebin.com/R3JCmJ9L I can even do the lora just fine. Fixed bits and bytes from pypy works, its just more verbose in messages. |
I also have this error on GTX 1660 Ti. I'm guessing this means GTX 16XX series isn't compatible despite also being Turing architecture. |
Looks like GTX 16XX does support 8-bit, it just wasn't enabled in bitsandbytes until now. bitsandbytes-foundation/bitsandbytes#292 So starting with bitsandbytes 0.38.0 these GPUs should work. EDIT: Just tested with bitsandbytes upgraded to 0.38.0.post2 on GTX 1660 Ti and it works perfectly. |
try rebuild bitsandbytes from https://github.com/TimDettmers/bitsandbytes 【fix todo】 |
I had the same issue when I wanted to load the model in 8bit. Loading the model in 4bit solved my problem. load-in-4bit=True |
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
Describe the bug
my device is GTX 1650 4GB,i5-12400 , 40BG RAM. Ubuntu 20.04. cuda11.8
I have set llama-7b according to the wiki
I can run it with
python server.py --listen --auto-devices --model llama-7b
and everything goes well!
But I can't run with
--load-in-8bit
. According to #366 I should use this.when I begin with
python server.py --listen --auto-devices --model llama-7b --load-in-8bit
There is no error, everything seeming good,BUT once I use the web ui click the ‘Generate’ button,
there error comes in the terminal
Is there an existing issue for this?
Reproduction
this is not only happened for llama-7b , it can easily reproduction in anyother models
like:
run
python server.py --listen --model opt-1.3b --load-in-8bit
There is no error, BUT once use the web ui enter anything and click the ‘Generate’ button,
there error comes in the terminal, it seems the bug has something to do with the
cublasLt
, like a cuda bug.and there is no bug wih cpu
python server.py --listen --model opt-1.3b --load-in-8bit
it's going well.Screenshot
No response
Logs
The text was updated successfully, but these errors were encountered: