add Sequence Parallelism #6506

HaoshengZou · 2025-01-02T04:01:31Z

What does this PR do?

add Sequence Parallelism (#4733 #5024 #5207 #5815 #5841 etc.)
direct plug&play use at https://github.com/Qihoo360/360-LLaMA-Factory

We have a separate README and chat-group at https://github.com/Qihoo360/360-LLaMA-Factory, only for Sequence Parallelism part. They are not to be merged.
We developed based on LLaMA-Factory's latest release v0.9.1. We also based on https://github.com/zhuzilin/ring-flash-attention. The original repos are fully acknowledged.
We developed this at 360. I am PhD from Tsinghua-CS Prof. Jun Zhu's group.

Feel free to review and comment on changes as you see fit. We'll make it better.
Thank you!

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

hiyouga

Hi Haosheng, thanks a lot for your contribution, we have left some comments, could you kindly revise the code according them?

requirements.txt

setup.py

src/llamafactory/train/sft/workflow.py

src/llamafactory/train/sft/trainer.py

LiuLinyun · 2025-01-16T08:53:42Z

我看里面的 seq padding 是直接 pad 到 cutoff_len, 不知道我对这个提交的理解是否有偏差。若样本长度普遍偏短，是否会出现计算浪费？
I see it is padded to cutoff_len, not max len of a micro batch, whether am I misunderstood. If most samples are shoter then cutoff_len, there are a lot of computation waste.

root and others added 20 commits December 25, 2024 18:37

ring-attn DPO migration done, checking

2446772

fix padding whether sequence parallel or not

d918f75

dpo dummy forward; sampler logic

6b00d52

DPO checked; cutoff_len with sp checked

eec1182

fix sp_split

c42a9f7

sft checked; multiple fixes

797b149

README wip

c0ff22a

update README

dfaab85

support dpo+sft loss; support nca; auto cutoff_len

ad6844f

minor fixes

21105b2

update README

70b8a36

update README

d04bd6a

update readme

295039b

revert nca, sync with lf

a8c3c3c

update README

c64ffcd

update logo

35f8f37

update README

51c5f23

SP disables attention_dropout; get vocab_size in sft trainer

c33a486

update README

fbc2bff

final_sp_pr_0102

c36ace0

HaoshengZou mentioned this pull request Jan 2, 2025

Full-finetuning Long Context, Big Cutoff Length LLM #5024

Closed

1 task

0102_check_code_style

fc8e590

This was referenced Jan 2, 2025

请问使用Deepspeed训练是否支持DeepSpeed-Ulysses的序列并行？支持的话如何开启，不支持的话是否考虑支持呢？ #5207

Open

Need Help About Long Context #5815

Open

[求助] dpo 训练 72b 模型，显存溢出 #5841

Open

hiyouga requested changes Jan 2, 2025

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

setup.py Outdated Show resolved Hide resolved

src/llamafactory/train/sft/workflow.py Outdated Show resolved Hide resolved

src/llamafactory/train/sft/trainer.py Show resolved Hide resolved

gom168 added 2 commits January 3, 2025 12:13

0103_change_for_pr

332ad88

require require_position_ids if sp > 1 regardless of sp mode

f6fdd92

hiyouga added the pending This problem is yet to be addressed label Jan 8, 2025

hiyouga self-requested a review January 10, 2025 06:16

fix_bug_in_transformers446

eaebb7a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Sequence Parallelism #6506

add Sequence Parallelism #6506

HaoshengZou commented Jan 2, 2025 •

edited

Loading

hiyouga left a comment

LiuLinyun commented Jan 16, 2025

add Sequence Parallelism #6506

Are you sure you want to change the base?

add Sequence Parallelism #6506

Conversation

HaoshengZou commented Jan 2, 2025 • edited Loading

What does this PR do?

Before submitting

hiyouga left a comment

Choose a reason for hiding this comment

LiuLinyun commented Jan 16, 2025

HaoshengZou commented Jan 2, 2025 •

edited

Loading