-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
更换 template 导致 train_bash.py FAILED,貌似与 cuda 管理有关 #3022
Comments
使用 --resize_vocab 增加词表大小 |
@hiyouga 感谢您的回复!在训练过程中,遇到的问题已经解决。 CUDA_VISIBLE_DEVICES=0 python src/web_demo.py \
--model_name_or_path /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2 \
--adapter_name_or_path /root/autodl-tmp/fhy/finetune3/mistral-instruct-2-lora-moretokens-chatml \
--template chatml \
--finetuning_type lora \
--resize_vocab 输出与报错: [INFO|tokenization_utils_base.py:2082] 2024-04-01 09:51:17,835 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2082] 2024-04-01 09:51:17,835 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2082] 2024-04-01 09:51:17,835 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2082] 2024-04-01 09:51:17,835 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2082] 2024-04-01 09:51:17,835 >> loading file tokenizer.json
[INFO|configuration_utils.py:724] 2024-04-01 09:51:17,890 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/config.json
[INFO|configuration_utils.py:789] 2024-04-01 09:51:17,891 >> Model config MistralConfig {
"_name_or_path": "/root/autodl-tmp/models/Mistral-7B-Instruct-v0.2",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.39.1",
"use_cache": true,
"vocab_size": 32000
}
04/01/2024 09:51:17 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3280] 2024-04-01 09:51:17,908 >> loading weights file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
[INFO|modeling_utils.py:1417] 2024-04-01 09:51:17,908 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-01 09:51:17,909 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.23it/s]
[INFO|modeling_utils.py:4024] 2024-04-01 09:51:20,778 >> All model checkpoint weights were used when initializing MistralForCausalLM.
[INFO|modeling_utils.py:4032] 2024-04-01 09:51:20,778 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-04-01 09:51:20,781 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/generation_config.json
[INFO|configuration_utils.py:928] 2024-04-01 09:51:20,781 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
04/01/2024 09:51:20 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
04/01/2024 09:51:21 - INFO - llmtuner.model.adapter - Merged 1 adapter(s).
04/01/2024 09:51:21 - INFO - llmtuner.model.adapter - Loaded adapter(s): /root/autodl-tmp/fhy/finetune3/mistral-instruct-2-lora-moretokens-chatml-continue/checkpoint-1500
04/01/2024 09:51:21 - INFO - llmtuner.model.loader - all params: 7241732096
04/01/2024 09:51:21 - INFO - llmtuner.data.template - Replace eos token: <|im_end|>
04/01/2024 09:51:21 - WARNING - llmtuner.data.template - New tokens have been added, make sure `resize_vocab` is True.
04/01/2024 09:51:21 - INFO - llmtuner.data.template - Add pad token: <|im_end|>
04/01/2024 09:51:21 - INFO - llmtuner.data.template - Add <|im_start|> to stop words.
04/01/2024 09:51:21 - WARNING - llmtuner.data.template - New tokens have been added, make sure `resize_vocab` is True.
Running on local URL: http://0.0.0.0:7860
To create a public link, set `share=True` in `launch()`.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [33,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [34,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [35,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [36,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [37,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [38,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [39,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [40,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [41,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [42,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [43,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [44,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [45,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [46,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [47,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [48,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [49,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [50,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [51,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [52,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [53,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [54,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [55,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [56,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [57,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [58,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [59,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [60,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [61,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [62,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1290: indexSelectLargeIndex: block: [700,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
Exception in thread Thread-9 (generate):
Traceback (most recent call last):
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1157, in forward
outputs = self.model(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1004, in forward
attention_mask = _prepare_4d_causal_attention_mask_for_sdpa(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/modeling_attn_mask_utils.py", line 335, in _prepare_4d_causal_attention_mask_for_sdpa
elif not is_tracing and torch.all(attention_mask == 1):
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. 我确定在指令中添加了 |
@hiyouga 您好,我又尝试了 |
你在训练时候没有保存扩充后的词表,需要指定 |
@hiyouga 谢谢回复!但是按照您的指导对模型进行lora训练后,依然无法正确推理。具体情况如下: accelerate launch --config_file /root/autodl-tmp/fhy/finetune/config.yaml src/train_bash.py \
--stage sft \
--do_train \
--flash_attn True \
--quantization_bit 4 \
--model_name_or_path /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2 \
--dataset filtered_sampled_data \
--template chatml \
--finetuning_type lora \
--lora_target q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj \
--output_dir /root/autodl-tmp/fhy/finetune3/mistral-instruct-2-lora-moretokens-chatml \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 500 \
--learning_rate 5e-5 \
--num_train_epochs 4 \
--plot_loss \
--bf16 \
--overwrite_output_dir \
--lora_rank 16 \
--max_new_tokens 4096 \
--top_p 1 \
--num_beams 3 \
--temperature 0 \
--resize_vocab True \
--additional_target embed_tokens,lm_head 正常训练成功后结束。然后使用如下推理指令: CUDA_VISIBLE_DEVICES=0 python src/web_demo.py \
--model_name_or_path /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2 \
--adapter_name_or_path /root/autodl-tmp/fhy/finetune3/mistral-instruct-2-lora-moretokens-chatml \
--template chatml \
--finetuning_type lora 出现如下输出和报错: [INFO|tokenization_utils_base.py:2082] 2024-04-02 09:52:38,130 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2082] 2024-04-02 09:52:38,130 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2082] 2024-04-02 09:52:38,130 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2082] 2024-04-02 09:52:38,130 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2082] 2024-04-02 09:52:38,130 >> loading file tokenizer.json
[INFO|configuration_utils.py:724] 2024-04-02 09:52:38,184 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/config.json
[INFO|configuration_utils.py:789] 2024-04-02 09:52:38,186 >> Model config MistralConfig {
"_name_or_path": "/root/autodl-tmp/models/Mistral-7B-Instruct-v0.2",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.39.1",
"use_cache": true,
"vocab_size": 32000
}
04/02/2024 09:52:38 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3280] 2024-04-02 09:52:38,203 >> loading weights file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
[INFO|modeling_utils.py:1417] 2024-04-02 09:52:38,203 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-02 09:52:38,204 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.24it/s]
[INFO|modeling_utils.py:4024] 2024-04-02 09:52:41,034 >> All model checkpoint weights were used when initializing MistralForCausalLM.
[INFO|modeling_utils.py:4032] 2024-04-02 09:52:41,034 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-04-02 09:52:41,037 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/generation_config.json
[INFO|configuration_utils.py:928] 2024-04-02 09:52:41,037 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
04/02/2024 09:52:41 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
Traceback (most recent call last):
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 11, in <module>
main()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 5, in main
demo = create_web_demo()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/interface.py", line 55, in create_web_demo
engine = Engine(pure_chat=True)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/engine.py", line 20, in __init__
self.chatter = WebChatModel(self.manager, demo_mode, lazy_init=(not pure_chat))
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/chatter.py", line 27, in __init__
super().__init__()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 23, in __init__
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/hf_engine.py", line 33, in __init__
self.model, self.tokenizer = load_model_and_tokenizer(
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 149, in load_model_and_tokenizer
model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 94, in load_model
model = init_adapter(model, model_args, finetuning_args, is_trainable)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/adapter.py", line 110, in init_adapter
model: "LoraModel" = PeftModel.from_pretrained(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/peft/peft_model.py", line 356, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/peft/peft_model.py", line 730, in load_adapter
load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 249, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32064, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32064, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). 于是,我尝试修改 [INFO|tokenization_utils_base.py:2082] 2024-04-02 10:02:24,627 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2082] 2024-04-02 10:02:24,627 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2082] 2024-04-02 10:02:24,627 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2082] 2024-04-02 10:02:24,627 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2082] 2024-04-02 10:02:24,627 >> loading file tokenizer.json
[INFO|configuration_utils.py:724] 2024-04-02 10:02:24,684 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/config.json
[INFO|configuration_utils.py:789] 2024-04-02 10:02:24,685 >> Model config MistralConfig {
"_name_or_path": "/root/autodl-tmp/models/Mistral-7B-Instruct-v0.2",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.39.1",
"use_cache": true,
"vocab_size": 32064
}
04/02/2024 10:02:24 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3280] 2024-04-02 10:02:24,702 >> loading weights file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
[INFO|modeling_utils.py:1417] 2024-04-02 10:02:24,702 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-02 10:02:24,703 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 11, in <module>
main()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 5, in main
demo = create_web_demo()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/interface.py", line 55, in create_web_demo
engine = Engine(pure_chat=True)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/engine.py", line 20, in __init__
self.chatter = WebChatModel(self.manager, demo_mode, lazy_init=(not pure_chat))
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/chatter.py", line 27, in __init__
super().__init__()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 23, in __init__
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/hf_engine.py", line 33, in __init__
self.model, self.tokenizer = load_model_and_tokenizer(
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 149, in load_model_and_tokenizer
model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 89, in load_model
model = AutoModelForCausalLM.from_pretrained(model_args.model_name_or_path, config=config, **init_kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3531, in from_pretrained
) = cls._load_pretrained_model(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([32064, 4096])), this look incorrect. 我该如何对齐尺寸?等待您的回复! |
更新一下代码,然后推理时候也指定 |
应该不需要重新训练了,具体是哪个报错? |
@hiyouga CUDA_VISIBLE_DEVICES=1 python src/web_demo.py \
--model_name_or_path /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2 \
--adapter_name_or_path /root/autodl-tmp/fhy/finetune3/mistral-instruct-2-lora-moretokens-chatml \
--template chatml \
--finetuning_type lora \
--resize_vocab True 输出与报错: [INFO|tokenization_utils_base.py:2082] 2024-04-03 17:45:46,508 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:45:46,508 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:45:46,508 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:45:46,508 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:45:46,508 >> loading file tokenizer.json
[INFO|configuration_utils.py:724] 2024-04-03 17:45:46,562 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/config.json
[INFO|configuration_utils.py:789] 2024-04-03 17:45:46,563 >> Model config MistralConfig {
"_name_or_path": "/root/autodl-tmp/models/Mistral-7B-Instruct-v0.2",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.39.1",
"use_cache": true,
"vocab_size": 32000
}
04/03/2024 17:45:46 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3280] 2024-04-03 17:45:46,580 >> loading weights file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
[INFO|modeling_utils.py:1417] 2024-04-03 17:45:46,580 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-03 17:45:46,581 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:02<00:00, 1.23it/s]
[INFO|modeling_utils.py:4024] 2024-04-03 17:45:49,458 >> All model checkpoint weights were used when initializing MistralForCausalLM.
[INFO|modeling_utils.py:4032] 2024-04-03 17:45:49,458 >> All the weights of MistralForCausalLM were initialized from the model checkpoint at /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2.
If your task is similar to the task the model of the checkpoint was trained on, you can already use MistralForCausalLM for predictions without further training.
[INFO|configuration_utils.py:881] 2024-04-03 17:45:49,461 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/generation_config.json
[INFO|configuration_utils.py:928] 2024-04-03 17:45:49,461 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
04/03/2024 17:45:49 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA
Traceback (most recent call last):
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 9, in <module>
main()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 5, in main
create_web_demo().queue().launch(server_name="0.0.0.0", server_port=None, share=False, inbrowser=True)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/interface.py", line 52, in create_web_demo
engine = Engine(pure_chat=True)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/engine.py", line 19, in __init__
self.chatter = WebChatModel(self.manager, demo_mode, lazy_init=(not pure_chat))
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/chatter.py", line 27, in __init__
super().__init__()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 23, in __init__
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/hf_engine.py", line 33, in __init__
self.model, self.tokenizer = load_model_and_tokenizer(
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 148, in load_model_and_tokenizer
model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 93, in load_model
model = init_adapter(model, model_args, finetuning_args, is_trainable)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/adapter.py", line 110, in init_adapter
model: "LoraModel" = PeftModel.from_pretrained(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/peft/peft_model.py", line 356, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/peft/peft_model.py", line 730, in load_adapter
load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 249, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.modules_to_save.default.weight: copying a param with shape torch.Size([32064, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([32064, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). 更改模型 config [INFO|tokenization_utils_base.py:2082] 2024-04-03 17:47:05,387 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:47:05,388 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:47:05,388 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:47:05,388 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2082] 2024-04-03 17:47:05,388 >> loading file tokenizer.json
[INFO|configuration_utils.py:724] 2024-04-03 17:47:05,443 >> loading configuration file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/config.json
[INFO|configuration_utils.py:789] 2024-04-03 17:47:05,444 >> Model config MistralConfig {
"_name_or_path": "/root/autodl-tmp/models/Mistral-7B-Instruct-v0.2",
"architectures": [
"MistralForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 14336,
"max_position_embeddings": 32768,
"model_type": "mistral",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": null,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.39.1",
"use_cache": true,
"vocab_size": 32064
}
04/03/2024 17:47:05 - INFO - llmtuner.model.patcher - Using KV cache for faster generation.
[INFO|modeling_utils.py:3280] 2024-04-03 17:47:05,461 >> loading weights file /root/autodl-tmp/models/Mistral-7B-Instruct-v0.2/model.safetensors.index.json
[INFO|modeling_utils.py:1417] 2024-04-03 17:47:05,461 >> Instantiating MistralForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:928] 2024-04-03 17:47:05,462 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 9, in <module>
main()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/web_demo.py", line 5, in main
create_web_demo().queue().launch(server_name="0.0.0.0", server_port=None, share=False, inbrowser=True)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/interface.py", line 52, in create_web_demo
engine = Engine(pure_chat=True)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/engine.py", line 19, in __init__
self.chatter = WebChatModel(self.manager, demo_mode, lazy_init=(not pure_chat))
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/webui/chatter.py", line 27, in __init__
super().__init__()
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/chat_model.py", line 23, in __init__
self.engine: "BaseEngine" = HuggingfaceEngine(model_args, data_args, finetuning_args, generating_args)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/chat/hf_engine.py", line 33, in __init__
self.model, self.tokenizer = load_model_and_tokenizer(
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 148, in load_model_and_tokenizer
model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead)
File "/root/autodl-tmp/fhy/LLaMA-Factory/src/llmtuner/model/loader.py", line 88, in load_model
model = AutoModelForCausalLM.from_pretrained(model_args.model_name_or_path, config=config, **init_kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3531, in from_pretrained
) = cls._load_pretrained_model(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3958, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/transformers/modeling_utils.py", line 812, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/root/autodl-tmp/minicoda3/envs/lfactory/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 348, in set_module_tensor_to_device
raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([32064, 4096])), this look incorrect. 貌似与更新代码前相同? |
* fix packages * Update wechat.jpg * Updated README with new information * Updated README with new information * Updated README with new information * Follow HF_ENDPOINT environment variable * fix hiyouga#2346 * fix hiyouga#2777 hiyouga#2895 * add orca_dpo_pairs dataset * support fsdp + qlora * update readme * update tool extractor * paper release * add citation * move file * Update README.md, fix the release date of the paper * Update README_zh.md, fix the release date of the paper * Update wechat.jpg * fix hiyouga#2941 * fix hiyouga#2928 * fix hiyouga#2936 * fix Llama lora merge crash * fix Llama lora merge crash * fix Llama lora merge crash * pass ruff check * tiny fix * Update requirements.txt * Update README_zh.md * release v0.6.0 * add arg check * Update README_zh.md * Update README.md * update readme * tiny fix * release v0.6.0 (real) * Update wechat.jpg * fix hiyouga#2961 * fix bug * fix hiyouga#2981 * fix ds optimizer * update trainers * fix hiyouga#3010 * update readme * fix hiyouga#2982 * add project * update readme * release v0.6.1 * Update wechat.jpg * fix pile datset hf hub url * upgrade gradio to 4.21.0 * support save args in webui hiyouga#2807 hiyouga#3046 some ideas are borrowed from @marko1616 * Fix Llama model save for full param train * fix blank line contains whitespace * tiny fix * support ORPO * support orpo in webui * update readme * use log1p in orpo loss huggingface/trl#1491 * fix plots * fix IPO and ORPO loss * fix ORPO loss * update webui * support infer 4bit model on GPUs hiyouga#3023 * fix hiyouga#3077 * add qwen1.5 moe * fix hiyouga#3083 * set dev version * Update SECURITY.md * fix hiyouga#3022 * add moe aux loss control hiyouga#3085 * simplify readme * update readme * update readme * update examples * update examples * add zh readme * update examples * update readme * update vllm example * Update wechat.jpg * fix hiyouga#3116 * fix resize vocab at inference hiyouga#3022 * fix requires for windows * fix bug in latest gradio * back to gradio 4.21 and fix chat * tiny fix * update examples * update readme * support Qwen1.5-32B * support Qwen1.5-32B * fix spell error * support hiyouga#3152 * rename template to breeze * rename template to breeze * add empty line * Update wechat.jpg * tiny fix * fix quant infer and qwen2moe * Pass additional_target to unsloth Fixes hiyouga#3200 * Update adapter.py * Update adapter.py * fix hiyouga#3225 --------- Co-authored-by: hiyouga <[email protected]> Co-authored-by: 刘一博 <[email protected]> Co-authored-by: khazic <[email protected]> Co-authored-by: SirlyDreamer <[email protected]> Co-authored-by: Sanjay Nadhavajhala <[email protected]> Co-authored-by: sanjay920 <[email protected]> Co-authored-by: 0xez <[email protected]> Co-authored-by: marko1616 <[email protected]> Co-authored-by: Remek Kinas <[email protected]> Co-authored-by: Tsumugii24 <[email protected]> Co-authored-by: li.yunhao <[email protected]> Co-authored-by: sliderSun <[email protected]> Co-authored-by: codingma <[email protected]> Co-authored-by: Erich Schubert <[email protected]>
大佬您好,想请问一下这个参数的作用时啥,我今天也碰到了这个issue的问题,之前在qwen上正常训练,后面换成llama系列的模型后就报错,加上这个参数才恢复正常 |
Reminder
Reproduction
我的指令如下:
数据载入的过程应该是成功的,随后终端输出的错误信息:
这个报错看起来和 cuda 管理有关。在之前,我其他参数不变,设置
--template default
时是能正常训练的,但改为chatml
后便出现上面的报错。Expected behavior
我期望它正常进行训练。在之前,我其他参数不变,设置
--template default
时是能正常训练的,但改为chatml
后便出现上面的报错。这是我的 accelerate config:
System Info
transformers
version: 4.38.2Others
No response
The text was updated successfully, but these errors were encountered: