finetune的 MAX_STEPS = None 意义是什么？可以改成其他吗？ #24

ZenXir · 2023-04-01T06:43:43Z

这里的 MAX_STEPS = None 为什么要设置成None？可以改成其他吗？

if not args.wandb:
    os.environ["WANDB_MODE"] = "disable"
# optimized for RTX 4090. for larger GPUs, increase some of these?
MICRO_BATCH_SIZE = 4  # this could actually be 5 but i like powers of 2
BATCH_SIZE = 128
MAX_STEPS = None
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = 3  # we don't always need 3 tbh
LEARNING_RATE = 3e-4  # the Karpathy constant
CUTOFF_LEN = 256  # 256 accounts for about 96% of the data
LORA_R = 8
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
VAL_SET_SIZE = args.test_size #2000
TARGET_MODULES = [
    "q_proj",
    "v_proj",
]

The text was updated successfully, but these errors were encountered:

ZenXir · 2023-04-01T06:47:18Z

是这样的大佬老师我使用合并的model作为 base model 来 finetune，提示这个错误
关于 MAX_STEPS 设置为None的原因

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:22<00:00, 11.04s/it]
Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...
Downloading data files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 473.18it/s]
Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 42.30it/s]
Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 45.51it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876
Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 228, in <module>
    trainer = transformers.Trainer(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 543, in __init__
    if args.max_steps > 0:
TypeError: '>' not supported between instances of 'NoneType' and 'int'
(Chinese-alpaca-lora) root@DESKTOP-6KDJTBC:/mnt/e/Chinese-Vicuna#

Facico · 2023-04-01T06:53:44Z

@ZenXir max_step会在代码下面改。这个问题我昨天在本地branch改了忘push上来了，你可以更新一下。

ZenXir · 2023-04-01T06:56:27Z

好的大佬老师

ZenXir · 2023-04-01T08:04:49Z

大佬老师我使用合并的model 使用finetune.py 训练
试了多次一直报错

模型合并过程和流程分两步：
1、是先按照 https://github.com/ymcui/Chinese-LLaMA-Alpaca 给出的embedding过的model 合并出 pth模型
2、把 1 合并出的pth模型，再通过 stransformer 转换成 huggingface 格式：
python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /mnt/e/Chinese-LLaMA-Alpaca/model --model_size 7B --output_dir /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

finetune命令是：
python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

报错内容是这个：

CUDA SETUP: Loading binary /root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:21<00:00, 10.80s/it]
Found cached dataset json (/root/.cache/huggingface/datasets/json/default-5488fd0b86b9abc9/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.06it/s]
trainable params: 4194304 || all params: 6889689088 || trainable%: 0.060877986603275876

 If there's a warning about missing keys above, please disregard :)
/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
  0%|                                                                                                                                                             | 0/16260 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/mnt/e/Chinese-Vicuna/finetune.py", line 271, in <module>
    trainer.train(resume_from_checkpoint=args.resume_from_checkpoint)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1636, in train
    return inner_training_loop(
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 1903, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2649, in training_step
    loss = self.compute_loss(model, inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/trainer.py", line 2681, in compute_loss
    outputs = model(**inputs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/peft-0.3.0.dev0-py3.9.egg/peft/peft_model.py", line 529, in forward
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/torch-2.0.0-py3.9-linux-x86_64.egg/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/accelerate-0.17.1-py3.9.egg/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/root/anaconda3/envs/Chinese-alpaca-lora/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/transformers/models/llama/modeling_llama.py", line 786, in forward
    loss = loss_fct(shift_logits.view(-1, self.config.vocab_size), shift_labels.view(-1))
RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080

Facico · 2023-04-01T08:25:39Z

@ZenXir 我还没跑过他们的，你先自己研究一下吧。你这个情况就是没成功转过来。

RuntimeError: shape '[-1, 32000]' is invalid for input of size 50953080，llama的词表就是32000左右，这个仓库词表好像是49954这么多吧（不知道后续有没有更新）。如果我猜的没错的话，应该是要加上这一段东西model.resize_token_embeddings(len(tokenizer)) 来更新model内部的embedding维度，你可以试试

ZenXir · 2023-04-01T09:35:26Z

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬
我让机器跑两天看看训练出来的效果怎么样

vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为：", vocab_size)
model.resize_token_embeddings(vocab_size)

ZenXir · 2023-04-01T09:53:27Z

@Facico 对了大佬老师
用合并了 embedding model的模型finetune 我使用的命令是：
python finetune.py --data_path sample/merge.json --output_path lora-Vicuna_Embedded/7B/ --model_path /mnt/e/Chinese-LLaMA-Alpaca/model/7B_hf

其他参数都是默认的，我的机器是单卡 RTX4090 24G
在影响训练效果，和速度方面有什么建议调整的参数不？
像 bath_size , test_size, epoch 什么的
尤其效果方面的到时候可以更直观的对比

Facico · 2023-04-12T06:09:17Z

抱歉消息太多了有些消息会看漏，如果要直观的对比的话，保持batch size和epoch就可以了，如果想要跑快一点可以将mirco batch size调大

molyswu · 2023-04-18T05:51:10Z

双卡，RTX3090:

if not args.wandb:
37 os.environ["WANDB_MODE"] = "disable"
38 # optimized for RTX 4090. for larger GPUs, increase some of these?
39 MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2
40 BATCH_SIZE = 128
41 MAX_STEPS = None
42 GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
43 EPOCHS = 3 # we don't always need 3 tbh
44 LEARNING_RATE = 3e-4 # the Karpathy constant
45 CUTOFF_LEN = 256 # 256 accounts for about 96% of the data
46 LORA_R = 8
47 LORA_ALPHA = 16
48 LORA_DROPOUT = 0.05
49 VAL_SET_SIZE = args.test_size #2000
50 TARGET_MODULES = [
51 "q_proj",
52 "v_proj",
53 ]

molyswu · 2023-04-18T05:52:13Z

/root/anaconda3/lib/python3.9/site-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
warnings.warn(
0%| | 0/32481 [00:00<?, ?it/s]╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
./Chinese-Vicuna/finetune.py:271 in │
│ │
│ 268 │
│ 269 print("\n If there's a warning about missing keys above, please disregard :)") │
│ 270 │
│ ❱ 271 trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) │
│ 272 │
│ 273 model.save_pretrained(OUTPUT_DIR) │
│ 274 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:1929 in _inner_training_loop │
│ │
│ 1926 │ │ │ │ │ with model.no_sync(): │
│ 1927 │ │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1928 │ │ │ │ else: │
│ ❱ 1929 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1930 │ │ │ │ │
│ 1931 │ │ │ │ if ( │
│ 1932 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2699 in training_step │
│ │
│ 2696 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2697 │ │ │
│ 2698 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2699 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2700 │ │ │
│ 2701 │ │ if self.args.n_gpu > 1: │
│ 2702 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/trainer.py:2731 in compute_loss │
│ │
│ 2728 │ │ │ labels = inputs.pop("labels") │
│ 2729 │ │ else: │
│ 2730 │ │ │ labels = None │
│ ❱ 2731 │ │ outputs = model(**inputs) │
│ 2732 │ │ # Save past state if it exists │
│ 2733 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2734 │ │ if self.args.past_index >= 0: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ in forward:663 │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1102 in _call_impl │
│ │
│ 1099 │ │ # this function, and just call forward. │
│ 1100 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1101 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1102 │ │ │ return forward_call(*input, **kwargs) │
│ 1103 │ │ # Do not call functions when jit is used │
│ 1104 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1105 │ │ if self._backward_hooks or _global_backward_hooks: │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/accelerate/hooks.py:165 in new_forward │
│ │
│ 162 │ │ │ with torch.no_grad(): │
│ 163 │ │ │ │ output = old_forward(*args, **kwargs) │
│ 164 │ │ else: │
│ ❱ 165 │ │ │ output = old_forward(*args, **kwargs) │
│ 166 │ │ return module._hf_hook.post_forward(module, output) │
│ 167 │ │
│ 168 │ module.forward = new_forward │
│ │
│ /root/anaconda3/lib/python3.9/site-packages/transformers/models/llama/modeling_llama.py:709 in │
│ forward │
│ │
│ 706 │ │ │ shift_labels = labels[..., 1:].contiguous() │
│ 707 │ │ │ # Flatten the tokens │
│ 708 │ │ │ loss_fct = CrossEntropyLoss() │
│ ❱ 709 │ │ │ shift_logits = shift_logits.view(-1, self.config.vocab_size) │
│ 710 │ │ │ shift_labels = shift_labels.view(-1) │
│ 711 │ │ │ # Enable model parallelism │
│ 712 │ │ │ shift_labels = shift_labels.to(shift_logits.device) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000
0%| | 0/32481 [00:04<?, ?it/s]

godzeo · 2023-04-24T02:34:50Z

在 prepare for traning 前这样 resize_token_embeddings 就可以训练了大佬我让机器跑两天看看训练出来的效果怎么样
vocab_size = len(tokenizer.get_vocab())
print("Tokenizer的词表数量为：", vocab_size)
model.resize_token_embeddings(vocab_size)

大佬三句代码是加在哪一步的哪个文件里面呢？我也想做同样的训练，奈何我太菜了，没明白

Facico · 2023-05-04T09:50:41Z

@godzeo 放在加载完模型和tokenizer后就行

abbhay · 2023-05-13T07:12:11Z

好的大佬老师

老哥这个max_step 怎么填哇

ZenXir closed this as completed Apr 1, 2023

Facico mentioned this issue Apr 18, 2023

RuntimeError: shape '[-1, 32001]' is invalid for input of size 32640000 #85

Closed

zhiyixu mentioned this issue May 15, 2023

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.layers.0 #156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

finetune的 MAX_STEPS = None 意义是什么？可以改成其他吗？ #24

finetune的 MAX_STEPS = None 意义是什么？可以改成其他吗？ #24

ZenXir commented Apr 1, 2023

ZenXir commented Apr 1, 2023

Facico commented Apr 1, 2023

ZenXir commented Apr 1, 2023

ZenXir commented Apr 1, 2023

Facico commented Apr 1, 2023

ZenXir commented Apr 1, 2023

ZenXir commented Apr 1, 2023

Facico commented Apr 12, 2023

molyswu commented Apr 18, 2023

molyswu commented Apr 18, 2023

godzeo commented Apr 24, 2023

Facico commented May 4, 2023

abbhay commented May 13, 2023

finetune的 MAX_STEPS = None 意义是什么？可以改成其他吗？ #24

finetune的 MAX_STEPS = None 意义是什么？可以改成其他吗？ #24

Comments

ZenXir commented Apr 1, 2023

ZenXir commented Apr 1, 2023

Facico commented Apr 1, 2023

ZenXir commented Apr 1, 2023

ZenXir commented Apr 1, 2023

Facico commented Apr 1, 2023

ZenXir commented Apr 1, 2023

ZenXir commented Apr 1, 2023

Facico commented Apr 12, 2023

molyswu commented Apr 18, 2023

molyswu commented Apr 18, 2023

godzeo commented Apr 24, 2023

Facico commented May 4, 2023

abbhay commented May 13, 2023