Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSDP+QLoRA微调后推理 #2981

Closed
1 task done
bgtii opened this issue Mar 26, 2024 · 2 comments
Closed
1 task done

FSDP+QLoRA微调后推理 #2981

bgtii opened this issue Mar 26, 2024 · 2 comments
Labels
solved This problem has been already solved

Comments

@bgtii
Copy link

bgtii commented Mar 26, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

微调
CUDA_VISIBLE_DEVICES=4,5 accelerate launch
--config_file /root/workspace/LLaMA-Factory/examples/accelerate/fsdp_config.yaml
/root/workspace/LLaMA-Factory/src/train_bash.py
--stage sft
--do_train
--model_name_or_path /models/llama-7b
--dataset 肾科ai生成1000问
--dataset_dir /root/workspace/LLaMA-Factory/data
--template default
--finetuning_type lora
--lora_target q_proj,v_proj
--output_dir /root/workspace/LLaMA-Factory/output_chechpoint
--overwrite_cache
--overwrite_output_dir
--cutoff_len 1024
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--gradient_accumulation_steps 8
--lr_scheduler_type cosine
--logging_steps 10
--save_steps 100
--eval_steps 100
--evaluation_strategy steps
--load_best_model_at_end
--learning_rate 5e-5
--num_train_epochs 3.0
--max_samples 3000
--val_size 0.1
--quantization_bit 4
--plot_loss
--fp16

推理
CUDA_VISIBLE_DEVICES=5 python src/cli_demo.py \
--model_name_or_path /models/llama-7b  \
--adapter_name_or_path /root/workspace/LLaMA-Factory/output_chechpoint \
--template default \
--finetuning_type lora

Expected behavior

No response

System Info

[2024-03-26 16:52:14,808] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[INFO|tokenization_utils_base.py:2082] 2024-03-26 16:52:16,469 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2082] 2024-03-26 16:52:16,469 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2082] 2024-03-26 16:52:16,469 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2082] 2024-03-26 16:52:16,469 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2082] 2024-03-26 16:52:16,469 >> loading file tokenizer.json
[WARNING|logging.py:329] 2024-03-26 16:52:16,469 >> You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565
[INFO|configuration_utils.py:724] 2024-03-26 16:52:16,555 >> loading configuration file /models/llama-7b/config.json
[INFO|configuration_utils.py:789] 2024-03-26 16:52:16,556 >> Model config LlamaConfig {
"_name_or_path": "/models/llama-7b",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 0,
"eos_token_id": 1,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 2048,
"max_sequence_length": 2048,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pad_token_id": -1,
"pretraining_tp": 1,
"rms_norm_eps": 1e-06,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.39.1",
"use_cache": true,
"vocab_size": 32000
}

03/26/2024 16:52:16 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3280] 2024-03-26 16:52:16,677 >> loading weights file /models/llama-7b/pytorch_model.bin.index.json
[INFO|modeling_utils.py:1417] 2024-03-26 16:52:16,698 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:928] 2024-03-26 16:52:16,699 >> Generate config GenerationConfig {
"bos_token_id": 0,
"eos_token_id": 1,
"pad_token_id": -1
}

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 33/33 [03:00<00:00, 5.46s/it]
[INFO|modeling_utils.py:4024] 2024-03-26 16:55:17,446 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4032] 2024-03-26 16:55:17,446 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /models/llama-7b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-03-26 16:55:17,452 >> loading configuration file /models/llama-7b/generation_config.json
[INFO|configuration_utils.py:928] 2024-03-26 16:55:17,452 >> Generate config GenerationConfig {
"bos_token_id": 0,
"eos_token_id": 1,
"pad_token_id": 0
}

03/26/2024 16:55:18 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
Traceback (most recent call last):
File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/config.py", line 197, in _get_peft_type
config_file = hf_hub_download(
File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 111, in _inner_fn
validate_repo_id(arg_value)
File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 159, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/workspace/LLaMA-Factory/output_chechpoint'. Use repo_type argument if needed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/workspace/LLaMA-Factory/src/cli_demo.py", line 49, in
main()
File "/root/workspace/LLaMA-Factory/src/cli_demo.py", line 15, in main
chat_model = ChatModel()
File "/root/workspace/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 23, in init
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
File "/root/workspace/LLaMA-Factory/src/llmtuner/chat/hf_engine.py", line 33, in init
self.model, self.tokenizer = load_model_and_tokenizer(
File "/root/workspace/LLaMA-Factory/src/llmtuner/model/loader.py", line 149, in load_model_and_tokenizer
model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead)
File "/root/workspace/LLaMA-Factory/src/llmtuner/model/loader.py", line 94, in load_model
model = init_adapter(model, model_args, finetuning_args, is_trainable)
File "/root/workspace/LLaMA-Factory/src/llmtuner/model/adapter.py", line 110, in init_adapter
model: "LoraModel" = PeftModel.from_pretrained(
File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/peft_model.py", line 328, in from_pretrained
PeftConfig._get_peft_type(
File "/root/miniconda3/envs/llama_factory/lib/python3.10/site-packages/peft/config.py", line 203, in _get_peft_type
raise ValueError(f"Can't find '{CONFIG_NAME}' at '{model_id}'")
ValueError: Can't find 'adapter_config.json' at '/root/workspace/LLaMA-Factory/output_chechpoint'

没有生成adapter_config.json文件

Others

No response

@hiyouga hiyouga added the solved This problem has been already solved label Mar 26, 2024
@bgtii
Copy link
Author

bgtii commented Mar 26, 2024

这个要怎么推理?

@hiyouga
Copy link
Owner

hiyouga commented Mar 26, 2024

更新源码重新微调

tybalex added a commit to sanjay920/LLaMA-Factory that referenced this issue Apr 10, 2024
* fix packages

* Update wechat.jpg

* Updated README with new information

* Updated README with new information

* Updated README with new information

* Follow HF_ENDPOINT environment variable

* fix hiyouga#2346

* fix hiyouga#2777 hiyouga#2895

* add orca_dpo_pairs dataset

* support fsdp + qlora

* update readme

* update tool extractor

* paper release

* add citation

* move file

* Update README.md, fix the release date of the paper

* Update README_zh.md, fix the release date of the paper

* Update wechat.jpg

* fix hiyouga#2941

* fix hiyouga#2928

* fix hiyouga#2936

* fix Llama lora merge crash

* fix Llama lora merge crash

* fix Llama lora merge crash

* pass ruff check

* tiny fix

* Update requirements.txt

* Update README_zh.md

* release v0.6.0

* add arg check

* Update README_zh.md

* Update README.md

* update readme

* tiny fix

* release v0.6.0 (real)

* Update wechat.jpg

* fix hiyouga#2961

* fix bug

* fix hiyouga#2981

* fix ds optimizer

* update trainers

* fix hiyouga#3010

* update readme

* fix hiyouga#2982

* add project

* update readme

* release v0.6.1

* Update wechat.jpg

* fix pile datset hf hub url

* upgrade gradio to 4.21.0

* support save args in webui hiyouga#2807 hiyouga#3046

some ideas are borrowed from @marko1616

* Fix Llama model save for full param train

* fix blank line contains whitespace

* tiny fix

* support ORPO

* support orpo in webui

* update readme

* use log1p in orpo loss

huggingface/trl#1491

* fix plots

* fix IPO and ORPO loss

* fix ORPO loss

* update webui

* support infer 4bit model on GPUs hiyouga#3023

* fix hiyouga#3077

* add qwen1.5 moe

* fix hiyouga#3083

* set dev version

* Update SECURITY.md

* fix hiyouga#3022

* add moe aux loss control hiyouga#3085

* simplify readme

* update readme

* update readme

* update examples

* update examples

* add zh readme

* update examples

* update readme

* update vllm example

* Update wechat.jpg

* fix hiyouga#3116

* fix resize vocab at inference hiyouga#3022

* fix requires for windows

* fix bug in latest gradio

* back to gradio 4.21 and fix chat

* tiny fix

* update examples

* update readme

* support Qwen1.5-32B

* support Qwen1.5-32B

* fix spell error

* support hiyouga#3152

* rename template to breeze

* rename template to breeze

* add empty line

* Update wechat.jpg

* tiny fix

* fix quant infer and qwen2moe

* Pass additional_target to unsloth

Fixes hiyouga#3200

* Update adapter.py

* Update adapter.py

* fix hiyouga#3225

---------

Co-authored-by: hiyouga <[email protected]>
Co-authored-by: 刘一博 <[email protected]>
Co-authored-by: khazic <[email protected]>
Co-authored-by: SirlyDreamer <[email protected]>
Co-authored-by: Sanjay Nadhavajhala <[email protected]>
Co-authored-by: sanjay920 <[email protected]>
Co-authored-by: 0xez <[email protected]>
Co-authored-by: marko1616 <[email protected]>
Co-authored-by: Remek Kinas <[email protected]>
Co-authored-by: Tsumugii24 <[email protected]>
Co-authored-by: li.yunhao <[email protected]>
Co-authored-by: sliderSun <[email protected]>
Co-authored-by: codingma <[email protected]>
Co-authored-by: Erich Schubert <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

2 participants