Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging a lora into a model results in an error thrown by safetensors package #3238

Closed
1 task done
jim-plus opened this issue Apr 12, 2024 · 1 comment
Closed
1 task done
Labels
solved This problem has been already solved

Comments

@jim-plus
Copy link

Reminder

  • I have read the README and searched the existing issues.

Reproduction

python src/export_model.py --model_name_or_path "basemodel1" --adapter_name_or_path "checkpoint1" --template default --export_dir "export1" --export_size 2

[INFO|modeling_utils.py:1417] 2024-04-11 22:12:34,190 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-11 22:12:34,190 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}

Loading checkpoint shards: 100%|█████████████████| 3/3 [00:18<00:00,  6.04s/it]
[INFO|modeling_utils.py:4024] 2024-04-11 22:12:53,408 >> All model checkpoint weights were used when initializing MistralForCausalLM.
...
[INFO|modeling_utils.py:3573] 2024-04-11 22:12:53,476 >> Generation config file not found, using a generation config created from the model config.
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
04/11/2024 22:12:54 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
INFO:llmtuner.model.adapter:Fine-tuning method: LoRA
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
04/11/2024 22:13:03 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
INFO:llmtuner.model.adapter:Merged 1 adapter(s).
04/11/2024 22:13:03 - INFO - llmtuner.model.adapter - Loaded adapter(s): checkpoint1
INFO:llmtuner.model.adapter:Loaded adapter(s): checkpoint1
04/11/2024 22:13:03 - INFO - llmtuner.model.loader - all params: 7241732096
INFO:llmtuner.model.loader:all params: 7241732096
[INFO|configuration_utils.py:697] 2024-04-11 22:13:03,367 >> Configuration saved in export1\generation_config.json
[WARNING|logging.py:329] 2024-04-11 22:13:03,372 >> Removed shared tensor {'model.layers.31.self_attn.v_proj.weight', 'model.layers.26.input_layernorm.weight', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.31.mlp.down_proj.weight', 'model.layers.26.post_attention_layernorm.weight', 'model.layers.28.input_layernorm.weight', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.30.self_attn.q_proj.weight', 'model.layers.1.post_attention_layernorm.weight', 'model.layers.17.mlp.down_proj.weight', 'model.layers.7.post_attention_layernorm.weight', 'model.layers.29.mlp.down_proj.weight', 'model.layers.26.mlp.gate_proj.weight', 'model.norm.weight', 'model.layers.28.mlp.down_proj.weight', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.11.mlp.down_proj.weight', 'model.layers.13.mlp.up_proj.weight', 'model.layers.4.mlp.up_proj.weight', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.2.input_layernorm.weight', 'model.layers.19.mlp.down_proj.weight', 'model.layers.6.post_attention_layernorm.weight', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.31.post_attention_layernorm.weight', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.30.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.10.post_attention_layernorm.weight', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.6.mlp.gate_proj.weight', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.15.mlp.down_proj.weight', 'model.layers.19.mlp.up_proj.weight', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.26.mlp.down_proj.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.25.mlp.gate_proj.weight', 'model.layers.3.mlp.down_proj.weight', 'model.layers.4.mlp.gate_proj.weight', 'model.layers.30.mlp.gate_proj.weight', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.22.mlp.gate_proj.weight', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.15.input_layernorm.weight', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.8.input_layernorm.weight', 'model.layers.10.mlp.gate_proj.weight', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.28.mlp.up_proj.weight', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.23.post_attention_layernorm.weight', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.21.mlp.up_proj.weight', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.24.mlp.down_proj.weight', 'model.layers.8.mlp.up_proj.weight', 'model.layers.15.mlp.gate_proj.weight', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.14.mlp.up_proj.weight', 'model.layers.23.mlp.up_proj.weight', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.5.mlp.up_proj.weight', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.23.mlp.gate_proj.weight', 'model.layers.10.input_layernorm.weight', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.13.mlp.gate_proj.weight', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.12.input_layernorm.weight', 'model.layers.25.input_layernorm.weight', 'model.layers.16.input_layernorm.weight', 'model.layers.3.input_layernorm.weight', 'model.layers.26.mlp.up_proj.weight', 'model.layers.27.post_attention_layernorm.weight', 'model.layers.21.post_attention_layernorm.weight', 'model.layers.12.mlp.down_proj.weight', 'model.layers.25.mlp.up_proj.weight', 'model.layers.9.mlp.down_proj.weight', 'model.layers.4.post_attention_layernorm.weight', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.12.mlp.gate_proj.weight', 'model.layers.17.mlp.up_proj.weight', 'model.layers.19.mlp.gate_proj.weight', 'model.layers.14.post_attention_layernorm.weight', 'model.layers.7.input_layernorm.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.18.post_attention_layernorm.weight', 'model.layers.9.self_attn.v_proj.weight', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.24.mlp.up_proj.weight', 'model.layers.5.post_attention_layernorm.weight', 'model.layers.2.mlp.gate_proj.weight', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.25.mlp.down_proj.weight', 'model.layers.21.input_layernorm.weight', 'model.layers.13.mlp.down_proj.weight', 'model.layers.15.post_attention_layernorm.weight', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.11.mlp.up_proj.weight', 'model.layers.30.mlp.up_proj.weight', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.11.input_layernorm.weight', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.18.input_layernorm.weight', 'model.layers.31.mlp.up_proj.weight', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.9.mlp.up_proj.weight', 'model.layers.9.input_layernorm.weight', 'model.layers.31.mlp.gate_proj.weight', 'model.layers.12.post_attention_layernorm.weight', 'model.layers.17.mlp.gate_proj.weight', 'model.layers.23.mlp.down_proj.weight', 'model.layers.8.mlp.down_proj.weight', 'model.layers.30.self_attn.o_proj.weight', 'model.layers.18.mlp.up_proj.weight', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.22.mlp.down_proj.weight', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.27.mlp.down_proj.weight', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.14.mlp.gate_proj.weight', 'model.layers.20.mlp.up_proj.weight', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.20.input_layernorm.weight', 'model.layers.15.mlp.up_proj.weight', 'model.layers.1.mlp.up_proj.weight', 'model.layers.24.post_attention_layernorm.weight', 'model.layers.10.mlp.up_proj.weight', 'model.layers.13.post_attention_layernorm.weight', 'model.layers.19.post_attention_layernorm.weight', 'model.layers.18.mlp.gate_proj.weight', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.28.mlp.gate_proj.weight', 'model.layers.8.mlp.gate_proj.weight', 'model.layers.21.mlp.down_proj.weight', 'model.layers.12.mlp.up_proj.weight', 'model.layers.27.mlp.up_proj.weight', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.29.mlp.gate_proj.weight', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.3.mlp.gate_proj.weight', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.2.mlp.up_proj.weight', 'model.layers.7.mlp.gate_proj.weight', 'model.layers.6.mlp.up_proj.weight', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.11.mlp.gate_proj.weight', 'model.layers.16.mlp.down_proj.weight', 'model.layers.20.mlp.gate_proj.weight', 'model.layers.20.mlp.down_proj.weight', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.28.post_attention_layernorm.weight', 'model.layers.22.input_layernorm.weight', 'model.layers.29.input_layernorm.weight', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.7.mlp.down_proj.weight', 'model.layers.29.mlp.up_proj.weight', 'model.layers.14.input_layernorm.weight', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.30.self_attn.k_proj.weight', 'model.layers.3.mlp.up_proj.weight', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.6.input_layernorm.weight', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.19.input_layernorm.weight', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.1.mlp.down_proj.weight', 'model.layers.2.post_attention_layernorm.weight', 'model.layers.18.mlp.down_proj.weight', 'model.layers.31.self_attn.k_proj.weight', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.22.mlp.up_proj.weight', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.23.input_layernorm.weight', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.5.mlp.gate_proj.weight', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.16.mlp.up_proj.weight', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.30.mlp.down_proj.weight', 'model.layers.3.post_attention_layernorm.weight', 'model.layers.24.input_layernorm.weight', 'model.layers.31.self_attn.o_proj.weight', 'model.layers.5.input_layernorm.weight', 'model.layers.29.post_attention_layernorm.weight', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.4.mlp.down_proj.weight', 'model.layers.30.input_layernorm.weight', 'model.layers.27.mlp.gate_proj.weight', 'model.layers.14.mlp.down_proj.weight', 'model.layers.20.post_attention_layernorm.weight', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.22.post_attention_layernorm.weight', 'model.layers.9.post_attention_layernorm.weight', 'model.layers.16.post_attention_layernorm.weight', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.2.mlp.down_proj.weight', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.6.mlp.down_proj.weight', 'model.layers.8.post_attention_layernorm.weight', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.21.mlp.gate_proj.weight', 'model.layers.4.input_layernorm.weight', 'model.layers.17.post_attention_layernorm.weight', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.10.mlp.down_proj.weight', 'model.layers.24.mlp.gate_proj.weight', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.31.input_layernorm.weight', 'model.layers.31.self_attn.q_proj.weight', 'model.layers.11.post_attention_layernorm.weight', 'model.layers.17.input_layernorm.weight', 'model.layers.5.mlp.down_proj.weight', 'model.layers.27.input_layernorm.weight', 'model.layers.30.post_attention_layernorm.weight', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.16.mlp.gate_proj.weight', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.7.mlp.up_proj.weight', 'model.layers.13.input_layernorm.weight', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.9.mlp.gate_proj.weight'} while saving. This should be OK, but check by verifying that you don't receive any warning while reloading
Traceback (most recent call last):
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 9, in <module>
    main()
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 5, in main
    export_model()
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\llmtuner\train\tuner.py", line 71, in export_model
    model.save_pretrained(
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\transformers\modeling_utils.py", line 2468, in save_pretrained
    safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 281, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
                   ^^^^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 485, in _flatten
    return {
           ^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 489, in <dictcomp>
    "data": _tobytes(v, k),
            ^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\safetensors\torch.py", line 411, in _tobytes
    tensor = tensor.to("cpu")
             ^^^^^^^^^^^^^^^^
NotImplementedError: Cannot copy out of meta tensor; no data!

Expected behavior

I was hoping the 7B model lora would merge with the base model, as I was able to train a small lora within 16GB VRAM.

The warnings indicate that some tasks were offloaded to the cpu, but safetensors didn't implement it? Not sure how to work around this.

Although the filesystem was under cygwin, I ran the script from Windows command line. I cloned the repo from yesterday or so. I and using current gaming Nvidia drivers, with the option to swap into conventional memory enabled (to enable slow swapping instead of crashing).

System Info

  • transformers version: 4.39.3
  • Platform: Windows-10-10.0.22631-SP0 (I've upgraded to Windows 11, but this was the version during original Python install)
  • Python version: 3.11.5
  • Huggingface_hub version: 0.22.1
  • Safetensors version: 0.4.2
  • Accelerate version: 0.27.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed

Others

No response

@hiyouga
Copy link
Owner

hiyouga commented Apr 12, 2024

fixed

@hiyouga hiyouga added the solved This problem has been already solved label Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants