More VRAM-efficient LoRa merger? #3273

jim-plus · 2024-04-15T03:44:30Z

Reminder

I have read the README and searched the existing issues.

Reproduction

set CUDA_VISIBLE_DEVICES=""

python src/export_model.py --model_name_or_path "grimjim/Mistral-7B-Instruct-demi-merge-v0.2-7B" --adapter_name_or_path checkpoint1 --template default --finetuning_type lora --export_dir export1 --export_size 3 --export_legacy_format False

04/14/2024 23:40:54 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
WARNING:root:Some parameters are on the meta device device because they were offloaded to the disk and cpu.
Traceback (most recent call last):
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 9, in <module>
    main()
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\export_model.py", line 5, in main
    export_model()
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\llmtuner\train\tuner.py", line 57, in export_model
    model = load_model(tokenizer, model_args, finetuning_args)  # must after fixing tokenizer to resize vocab
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\llmtuner\model\loader.py", line 93, in load_model
    model = init_adapter(model, model_args, finetuning_args, is_trainable)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\cygwin64\home\Jim\chat\LLaMA-Factory\src\llmtuner\model\adapter.py", line 113, in init_adapter
    model = model.merge_and_unload()
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\peft\tuners\lora\model.py", line 784, in merge_and_unload
    return self._unload_and_optionally_merge(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\peft\tuners\lora\model.py", line 435, in _unload_and_optionally_merge
    with onload_layer(target):
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\peft\tuners\tuners_utils.py", line 60, in onload_layer
    module._hf_hook.pre_forward(module)
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\accelerate\hooks.py", line 328, in pre_forward
    value = self.weights_map[name]
            ~~~~~~~~~~~~~~~~^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\accelerate\utils\offload.py", line 118, in __getitem__
    return self.dataset[f"{self.prefix}{key}"]
           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\accelerate\utils\offload.py", line 165, in __getitem__
    weight_info = self.index[key]
                  ~~~~~~~~~~^^^^^
KeyError: 'base_model.model.model.layers.0.self_attn.q_proj.base_layer.weight'

Expected behavior

I was hoping to perform the merger without requiring CUDA, as I got an OOM error. I have 32GB of PC memory, which should be enough if only one copy of the 7B LLM is retained during the merger process.

System Info

Windows 11

Others

I was able to successfully create a tiny LoRa for 100 rows of data (1.0 epochs) within 16GB VRAM. I was attempting to merge that in.

The text was updated successfully, but these errors were encountered:

hiyouga · 2024-04-15T07:34:01Z

fixed, please try it again

hiyouga closed this as completed in efc345c Apr 15, 2024

hiyouga added the solved This problem has been already solved label Apr 15, 2024

hiyouga mentioned this issue Apr 15, 2024

合并QLoRA权重报Cuda out of memory #3277

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More VRAM-efficient LoRa merger? #3273

More VRAM-efficient LoRa merger? #3273

jim-plus commented Apr 15, 2024

hiyouga commented Apr 15, 2024 •

edited

Loading

More VRAM-efficient LoRa merger? #3273

More VRAM-efficient LoRa merger? #3273

Comments

jim-plus commented Apr 15, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

hiyouga commented Apr 15, 2024 • edited Loading

hiyouga commented Apr 15, 2024 •

edited

Loading