Is flash_attn mandatory for training models like InternLM2? #4398

gaoyang07 · 2024-06-20T12:10:44Z

Reminder

I have read the README and searched the existing issues.

System Info

Question from InternLM/InternLM#747

Reproduction

model

model_name_or_path: internlm/internlm2-chat-7b

method

stage: sft
do_train: true
finetuning_type: lora
lora_target: all

dataset

dataset: data
template: intern2
cutoff_len: 1024
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16

output

output_dir: saves/internlm2-chat-7b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true

train

per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
fp16: true

eval
val_size: 0.1
per_device_eval_batch_size: 1
evaluation_strategy: steps
eval_steps: 500

Expected behavior

No response

Others

No response

hiyouga · 2024-06-20T12:54:54Z

Replace these lines:
https://huggingface.co/internlm/internlm2-chat-7b/blob/main/modeling_internlm2.py#L56-L58
with

try:
    from flash_attn import flash_attn_func, flash_attn_varlen_func
    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input
except ImportError:
    pass

It should resolve this problem.

HuggingFace's Transformers will check the imports for foreign codes, and raise error if we import flash_attn package without a try-except condition: https://github.com/huggingface/transformers/blob/v4.41.2/src/transformers/dynamic_module_utils.py#L161-L186

github-actions bot added the pending This problem is yet to be addressed label Jun 20, 2024

hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jun 20, 2024

hiyouga closed this as completed Jun 20, 2024

hiyouga added a commit that referenced this issue Jun 30, 2024

fix #4398 #4592

d74244d

PrimaLuz pushed a commit to PrimaLuz/LLaMA-Factory that referenced this issue Jul 1, 2024

fix hiyouga#4398 hiyouga#4592

e19628d

xtchen96 pushed a commit to xtchen96/LLaMA-Factory that referenced this issue Jul 17, 2024

fix hiyouga#4398 hiyouga#4592

1602648

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is flash_attn mandatory for training models like InternLM2? #4398

Is flash_attn mandatory for training models like InternLM2? #4398

gaoyang07 commented Jun 20, 2024

hiyouga commented Jun 20, 2024 •

edited

Loading

Is flash_attn mandatory for training models like InternLM2? #4398

Is flash_attn mandatory for training models like InternLM2? #4398

Comments

gaoyang07 commented Jun 20, 2024

Reminder

System Info

Reproduction

model

method

dataset

output

train

Expected behavior

Others

hiyouga commented Jun 20, 2024 • edited Loading

hiyouga commented Jun 20, 2024 •

edited

Loading