Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeError: expected scalar type Float but found Half" when using 16-bit with Lora. #519

Closed
1 task done
CrazyKrow opened this issue Mar 23, 2023 · 10 comments
Closed
1 task done
Labels
bug Something isn't working stale

Comments

@CrazyKrow
Copy link

Describe the bug

I have been using lora with --load-in-8bit but I saw that now Lora is supposed to work with 16-bit mode. But I'm getting "RuntimeError: expected scalar type Float but found Half" when I try to use it using with --bf16.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

python server.py --listen --listen-port 8888 --bf16 --model llama-13b-hf --lora alpaca-lora-13b --cai-chat --verbose --extension simple_memory

Screenshot

No response

Logs

Loading settings from settings.json...
Loading llama-13b-hf...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 41/41 [00:33<00:00,  1.24it/s]
Loaded the model in 33.57 seconds.
Loading the extension "simple_memory"... Ok.
Loading the extension "gallery"... Ok.
C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\gradio\deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://0.0.0.0:8888

To create a public link, set `share=True` in `launch()`.
Adding the LoRA alpaca-lora-13b to the model...



### Instruction: Generate a song in as Snoop Dog would write it talking about different tea flavors.
### Response:
--------------------

C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py:1201: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Exception in thread Thread-4 (gentask):
Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\callbacks.py", line 65, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\text_generation.py", line 215, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\peft_model.py", line 580, in generate
    return self.base_model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1452, in generate
    return self.sample(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2468, in sample
    outputs = self(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 772, in forward
    outputs = self.model(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 621, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 316, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 216, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\tuners\lora.py", line 512, in forward
    output = self.lora_B(self.lora_A(self.lora_dropout(x))).to(expected_dtype) * self.scaling
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

System Info

12th Gen Intel(R) Core(TM) i7-12700KF   3.61 GHz
RAM 32,0 GB
Windows 10, 64 bits
3090TI Nvidia
@CrazyKrow CrazyKrow added the bug Something isn't working label Mar 23, 2023
@oobabooga
Copy link
Owner

Did you git pull?

@CrazyKrow
Copy link
Author

Yeah, I did.

@oobabooga
Copy link
Owner

Can you check if things work as expected after this commit?

9bf6ecf

@CrazyKrow
Copy link
Author

CrazyKrow commented Mar 23, 2023

I get the same error. I did a fresh install and I still get the same error, even after your last commit

Loading settings from settings.json...
Loading llama-13b-hf...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 41/41 [00:35<00:00,  1.16it/s]
Loaded the model in 35.94 seconds.
alpaca-lora-13b
Adding the LoRA alpaca-lora-13b to the model...
Loading the extension "simple_memory"... Ok.
Loading the extension "gallery"... Ok.
C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\gradio\deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://0.0.0.0:8888

To create a public link, set `share=True` in `launch()`.



### Instruction: test
### Response:
--------------------

C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Exception in thread Thread-4 (gentask):
Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\callbacks.py", line 65, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\text_generation.py", line 215, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\peft_model.py", line 580, in generate
    return self.base_model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1462, in generate
    return self.sample(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2478, in sample
    outputs = self(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 765, in forward
    outputs = self.model(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 309, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 209, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\tuners\lora.py", line 512, in forward
    output = self.lora_B(self.lora_A(self.lora_dropout(x))).to(expected_dtype) * self.scaling
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

@badjeff
Copy link

badjeff commented Mar 24, 2023

Can you check if things work as expected after this commit?

9bf6ecf

Quick fix for using lora with --gptq-bits 4 --model llama-7b-hf.

line 23:

>> elif shared.args.load_in_8bit:
<< elif shared.args.load_in_8bit or shared.args.gptq_bits:

line 27

>> if not shared.args.load_in_8bit and not shared.args.cpu:
<< if not (shared.args.load_in_8bit or shared.args.gptq_bits) and not shared.args.cpu:

@oobabooga
Copy link
Owner

I'll give you a trophy if this works

@oobabooga
Copy link
Owner

Same error for me:

Command

python server.py --gptq-bits 4 --lora alpaca-lora-7b --model llama-7b-hf

Error

  File "/root/text-generation-webui/server.py", line 238, in <module>
    add_lora_to_model(shared.lora_name)
  File "/root/text-generation-webui/modules/LoRA.py", line 33, in add_lora_to_model
    shared.model = PeftModel.from_pretrained(shared.model, Path(f"loras/{lora_name}"), **params)
  File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 143, in from_pretrained
    model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
  File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 514, in __init__
    super().__init__(model, peft_config)
  File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 79, in __init__
    self.base_model = LoraModel(peft_config, model)
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 118, in __init__
    self._find_and_replace()
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 179, in _find_and_replace
    self._replace_module(parent, target_name, new_module, target)
UnboundLocalError: local variable 'new_module' referenced before assignment

Diff

diff --git a/modules/LoRA.py b/modules/LoRA.py
index aa68ad3..524545f 100644
--- a/modules/LoRA.py
+++ b/modules/LoRA.py
@@ -27,11 +27,11 @@ def add_lora_to_model(lora_name):
             params['dtype'] = shared.model.dtype
             if hasattr(shared.model, "hf_device_map"):
                 params['device_map'] = {"base_model.model."+k: v for k, v in shared.model.hf_device_map.items()}
-            elif shared.args.load_in_8bit:
+            elif shared.args.load_in_8bit or shared.args.gptq_bits:
                 params['device_map'] = {'': 0}
             
         shared.model = PeftModel.from_pretrained(shared.model, Path(f"loras/{lora_name}"), **params)
-        if not shared.args.load_in_8bit and not shared.args.cpu:
+        if not (shared.args.load_in_8bit or shared.args.gptq_bits) and not shared.args.cpu:
             shared.model.half()
             if not hasattr(shared.model, "hf_device_map"):
                 shared.model.cuda()

@badjeff
Copy link

badjeff commented Mar 24, 2023

Used to apply this fix for UnboundLocalError: local variable 'new_module' referenced before assignment..

@oobabooga
Copy link
Owner

This fix doesn't really work #332 (comment)

People have been using this patch: https://github.com/johnsmith0031/alpaca_lora_4bit

@github-actions
Copy link

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

3 participants