We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question from InternLM/InternLM#747
model_name_or_path: internlm/internlm2-chat-7b
stage: sft do_train: true finetuning_type: lora lora_target: all
dataset: data template: intern2 cutoff_len: 1024 max_samples: 1000 overwrite_cache: true preprocessing_num_workers: 16
output_dir: saves/internlm2-chat-7b/lora/sft logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 fp16: true
eval val_size: 0.1 per_device_eval_batch_size: 1 evaluation_strategy: steps eval_steps: 500
No response
The text was updated successfully, but these errors were encountered:
Replace these lines: https://huggingface.co/internlm/internlm2-chat-7b/blob/main/modeling_internlm2.py#L56-L58 with
try: from flash_attn import flash_attn_func, flash_attn_varlen_func from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input except ImportError: pass
It should resolve this problem.
HuggingFace's Transformers will check the imports for foreign codes, and raise error if we import flash_attn package without a try-except condition: https://github.com/huggingface/transformers/blob/v4.41.2/src/transformers/dynamic_module_utils.py#L161-L186
Sorry, something went wrong.
fix #4398 #4592
d74244d
fix hiyouga#4398 hiyouga#4592
e19628d
1602648
No branches or pull requests
Reminder
System Info
Question from InternLM/InternLM#747
Reproduction
model
model_name_or_path: internlm/internlm2-chat-7b
method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
dataset
dataset: data
template: intern2
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
output
output_dir: saves/internlm2-chat-7b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true
eval
val_size: 0.1
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 500
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: