-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using nerdy rodent's dreamlab training, I have error on training about cuda. #52
Comments
11.8 isn't currently supported, you might try an older CUDA library version I'd go with 11.6 or earlier. |
Same error and i'm on 11.7:
GPU: 1080 ti How i downgrade to 11.6, just copy this commands: and it will downgrade or need to uninstall Ubuntu and start all over again? Or need to deleted everything CUDA related with this commands?
|
@brentjohnston What GPU you have and what you selected on accelerate config when asking [NO/fp16/bf16]? PD: I tried different selections but nothing changed. |
Can confirm that with CUDA 11.6 it works, at least with a 1080 TI. The guide of nerdy rodent's use 11.7 on the Pastebin and in the video he shows 11.8, so none of them will work, following that part it will never have worked. |
In the video, pastebin and on my system I use CUDA 11.7.1. - typically Nvidia updated the day after ;) You'll need to ensure your MS Windows system is up-to-date as well. If you have old Nvidia drivers in MS Windows you may need to downgrade CUDA. Where it says |
Correct, this was the main cause, not the CUDA version. The export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH need to be in the config of the train file. Even if you reboot, it will still not find CUDA if that line is not added. But in your video you say, "reboot or add this line". So ppl take that as if you restart not need to add that line, but the line must be added permanent in the config. |
This is super helpful — thank you, everyone! I will add CUDA 11.8 as soon as possible! |
CUDA 11.8 was added in the lastest release. I also added code that gives some compilation and debugging instructions if the CUDA setup fails. |
Sorry to bother, but for us tech newbies, how does one do that? |
In your train file: export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH accelerate launch train_dreambooth.py |
I have this issue with nerdy rodents guide on oobabooga's text-generation-webui with one-click installer on gtx 1080ti in windows. Bitsandbytes cannot find cuda. What is the solution there? Can I add that line somewhere? |
See this post oobabooga/text-generation-webui#20 (comment) :) |
Hi, I got the same error but I don't have the folder "/usr/lib/wsl", could you tell me what the problem might be? Much appreciated! |
I am using Nerdy Rodent's dreamlab local install video which I have followed step by step, at the end bitsandbytes seems to give an error. I tried reloading all the CUDA stuff and tried the new 11.8 cuda version which seems to differ from video and still gives same error:
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link
/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:86: UserWarning: /home/user/anaconda3/envs/diffusers did not contain libcudart.so as expected! Searching further paths...
warn(
/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('CompVis/stable-diffusion-v1-4')}
warn(
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
Traceback (most recent call last):
File "/home/user/github/diffusers/examples/dreambooth/train_dreambooth.py", line 657, in
main()
File "/home/user/github/diffusers/examples/dreambooth/train_dreambooth.py", line 446, in main
import bitsandbytes as bnb
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/init.py", line 6, in
from .autograd._functions import (
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/autograd/_functions.py", line 5, in
import bitsandbytes.functional as F
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cextension.py", line 41, in
lib = CUDALibrary_Singleton.get_instance().lib
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cextension.py", line 37, in get_instance
cls._instance.initialize()
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cextension.py", line 15, in initialize
binary_name = evaluate_cuda_setup()
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py", line 132, in evaluate_cuda_setup
cc = get_compute_capability(cuda)
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py", line 105, in get_compute_capability
ccs = get_compute_capabilities(cuda)
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py", line 83, in get_compute_capabilities
check_cuda_result(cuda, cuda.cuDeviceGetCount(ctypes.byref(nGpus)))
AttributeError: 'NoneType' object has no attribute 'cuDeviceGetCount'
Traceback (most recent call last):
File "/home/user/anaconda3/envs/diffusers/bin/accelerate", line 8, in
sys.exit(main())
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/home/user/anaconda3/envs/diffusers/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/user/anaconda3/envs/diffusers/bin/python', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=training', '--output_dir=classes', '--instance_prompt=A sks dog', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=no', '--use_8bit_adam', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--sample_batch_size=4', '--max_train_steps=800']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered: