"RuntimeError: expected scalar type Float but found Half" when using 16-bit with Lora. #519

CrazyKrow · 2023-03-23T17:42:58Z

Describe the bug

I have been using lora with --load-in-8bit but I saw that now Lora is supposed to work with 16-bit mode. But I'm getting "RuntimeError: expected scalar type Float but found Half" when I try to use it using with --bf16.

Is there an existing issue for this?

I have searched the existing issues

Reproduction

python server.py --listen --listen-port 8888 --bf16 --model llama-13b-hf --lora alpaca-lora-13b --cai-chat --verbose --extension simple_memory

Screenshot

No response

Logs

Loading settings from settings.json...
Loading llama-13b-hf...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 41/41 [00:33<00:00,  1.24it/s]
Loaded the model in 33.57 seconds.
Loading the extension "simple_memory"... Ok.
Loading the extension "gallery"... Ok.
C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\gradio\deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://0.0.0.0:8888

To create a public link, set `share=True` in `launch()`.
Adding the LoRA alpaca-lora-13b to the model...



### Instruction: Generate a song in as Snoop Dog would write it talking about different tea flavors.
### Response:
--------------------

C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py:1201: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Exception in thread Thread-4 (gentask):
Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\callbacks.py", line 65, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\text_generation.py", line 215, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\peft_model.py", line 580, in generate
    return self.base_model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1452, in generate
    return self.sample(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2468, in sample
    outputs = self(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 772, in forward
    outputs = self.model(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 621, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 316, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 216, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\tuners\lora.py", line 512, in forward
    output = self.lora_B(self.lora_A(self.lora_dropout(x))).to(expected_dtype) * self.scaling
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

System Info

12th Gen Intel(R) Core(TM) i7-12700KF   3.61 GHz
RAM 32,0 GB
Windows 10, 64 bits
3090TI Nvidia

The text was updated successfully, but these errors were encountered:

oobabooga · 2023-03-23T17:52:02Z

Did you git pull?

CrazyKrow · 2023-03-23T17:55:53Z

Yeah, I did.

oobabooga · 2023-03-23T19:50:17Z

Can you check if things work as expected after this commit?

9bf6ecf

CrazyKrow · 2023-03-23T20:03:22Z

I get the same error. I did a fresh install and I still get the same error, even after your last commit

Loading settings from settings.json...
Loading llama-13b-hf...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Loading binary C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 41/41 [00:35<00:00,  1.16it/s]
Loaded the model in 35.94 seconds.
alpaca-lora-13b
Adding the LoRA alpaca-lora-13b to the model...
Loading the extension "simple_memory"... Ok.
Loading the extension "gallery"... Ok.
C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\gradio\deprecation.py:40: UserWarning: The 'type' parameter has been deprecated. Use the Number component instead.
  warnings.warn(value)
Running on local URL:  http://0.0.0.0:8888

To create a public link, set `share=True` in `launch()`.



### Instruction: test
### Response:
--------------------

C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py:1211: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
  warnings.warn(
Exception in thread Thread-4 (gentask):
Traceback (most recent call last):
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\callbacks.py", line 65, in gentask
    ret = self.mfunc(callback=_callback, **self.kwargs)
  File "D:\TEXT GENERATOR\text-generation-webui\modules\text_generation.py", line 215, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\peft_model.py", line 580, in generate
    return self.base_model.generate(**kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1462, in generate
    return self.sample(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2478, in sample
    outputs = self(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 765, in forward
    outputs = self.model(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 614, in forward
    layer_outputs = decoder_layer(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 309, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\transformers\models\llama\modeling_llama.py", line 209, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\peft\tuners\lora.py", line 512, in forward
    output = self.lora_B(self.lora_A(self.lora_dropout(x))).to(expected_dtype) * self.scaling
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\Usuario\anaconda3\envs\textgen\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Half

badjeff · 2023-03-24T18:58:07Z

Can you check if things work as expected after this commit?

9bf6ecf

Quick fix for using lora with --gptq-bits 4 --model llama-7b-hf.

line 23:

>> elif shared.args.load_in_8bit:
<< elif shared.args.load_in_8bit or shared.args.gptq_bits:

line 27

>> if not shared.args.load_in_8bit and not shared.args.cpu:
<< if not (shared.args.load_in_8bit or shared.args.gptq_bits) and not shared.args.cpu:

oobabooga · 2023-03-24T19:02:38Z

I'll give you a trophy if this works

oobabooga · 2023-03-24T19:18:33Z

Same error for me:

Command

python server.py --gptq-bits 4 --lora alpaca-lora-7b --model llama-7b-hf

Error

  File "/root/text-generation-webui/server.py", line 238, in <module>
    add_lora_to_model(shared.lora_name)
  File "/root/text-generation-webui/modules/LoRA.py", line 33, in add_lora_to_model
    shared.model = PeftModel.from_pretrained(shared.model, Path(f"loras/{lora_name}"), **params)
  File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 143, in from_pretrained
    model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config)
  File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 514, in __init__
    super().__init__(model, peft_config)
  File "/opt/conda/lib/python3.10/site-packages/peft/peft_model.py", line 79, in __init__
    self.base_model = LoraModel(peft_config, model)
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 118, in __init__
    self._find_and_replace()
  File "/opt/conda/lib/python3.10/site-packages/peft/tuners/lora.py", line 179, in _find_and_replace
    self._replace_module(parent, target_name, new_module, target)
UnboundLocalError: local variable 'new_module' referenced before assignment

Diff

diff --git a/modules/LoRA.py b/modules/LoRA.py
index aa68ad3..524545f 100644
--- a/modules/LoRA.py
+++ b/modules/LoRA.py
@@ -27,11 +27,11 @@ def add_lora_to_model(lora_name):
             params['dtype'] = shared.model.dtype
             if hasattr(shared.model, "hf_device_map"):
                 params['device_map'] = {"base_model.model."+k: v for k, v in shared.model.hf_device_map.items()}
-            elif shared.args.load_in_8bit:
+            elif shared.args.load_in_8bit or shared.args.gptq_bits:
                 params['device_map'] = {'': 0}
             
         shared.model = PeftModel.from_pretrained(shared.model, Path(f"loras/{lora_name}"), **params)
-        if not shared.args.load_in_8bit and not shared.args.cpu:
+        if not (shared.args.load_in_8bit or shared.args.gptq_bits) and not shared.args.cpu:
             shared.model.half()
             if not hasattr(shared.model, "hf_device_map"):
                 shared.model.cuda()

badjeff · 2023-03-24T19:29:00Z

Used to apply this fix for UnboundLocalError: local variable 'new_module' referenced before assignment..

oobabooga · 2023-03-24T19:32:24Z

This fix doesn't really work #332 (comment)

People have been using this patch: https://github.com/johnsmith0031/alpaca_lora_4bit

github-actions · 2023-04-23T23:16:15Z

This issue has been closed due to inactivity for 30 days. If you believe it is still relevant, please leave a comment below.

CrazyKrow added the bug Something isn't working label Mar 23, 2023

Sekoya78 mentioned this issue Apr 13, 2023

Error with OPT-13B-4Bit model: "expected scalar type Float but found Half" #1116

Closed

1 task

github-actions bot added the stale label Apr 23, 2023

github-actions bot closed this as completed Apr 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: expected scalar type Float but found Half" when using 16-bit with Lora. #519

"RuntimeError: expected scalar type Float but found Half" when using 16-bit with Lora. #519

CrazyKrow commented Mar 23, 2023

oobabooga commented Mar 23, 2023

CrazyKrow commented Mar 23, 2023

oobabooga commented Mar 23, 2023

CrazyKrow commented Mar 23, 2023 •

edited

Loading

badjeff commented Mar 24, 2023

oobabooga commented Mar 24, 2023

oobabooga commented Mar 24, 2023

badjeff commented Mar 24, 2023

oobabooga commented Mar 24, 2023

github-actions bot commented Apr 23, 2023

"RuntimeError: expected scalar type Float but found Half" when using 16-bit with Lora. #519

"RuntimeError: expected scalar type Float but found Half" when using 16-bit with Lora. #519

Comments

CrazyKrow commented Mar 23, 2023

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

oobabooga commented Mar 23, 2023

CrazyKrow commented Mar 23, 2023

oobabooga commented Mar 23, 2023

CrazyKrow commented Mar 23, 2023 • edited Loading

badjeff commented Mar 24, 2023

oobabooga commented Mar 24, 2023

oobabooga commented Mar 24, 2023

Command

Error

Diff

badjeff commented Mar 24, 2023

oobabooga commented Mar 24, 2023

github-actions bot commented Apr 23, 2023

CrazyKrow commented Mar 23, 2023 •

edited

Loading