Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sft_packing实现的问题 #2289

Closed
1 task done
dyh1996 opened this issue Jan 22, 2024 · 7 comments · Fixed by #4009 or #4224
Closed
1 task done

sft_packing实现的问题 #2289

dyh1996 opened this issue Jan 22, 2024 · 7 comments · Fixed by #4009 or #4224
Labels
solved This problem has been already solved

Comments

@dyh1996
Copy link

dyh1996 commented Jan 22, 2024

Reminder

  • I have read the README and searched the existing issues.

Reproduction

看目前sft_packing的实现只是单纯将不同的单轮sft数据拼接到一起,然后分别计算target部分的loss

def preprocess_packed_supervised_dataset(
examples: Dict[str, List[Any]],
tokenizer: "PreTrainedTokenizer",
template: "Template",
data_args: "DataArguments",
) -> Dict[str, List[List[int]]]:
# build inputs with format <bos> X1 Y1 <eos> <bos> X2 Y2 <eos>
# and labels with format <ignore> ... <ignore> Y1 <eos> <ignore> ... <ignore> Y2 <eos>
model_inputs = {"input_ids": [], "attention_mask": [], "labels": []}

这里是不是应该增加对position_ids的修改呢?从而保证每条单轮sft在计算loss的时候不会受到其他拼接的上文影响

Expected behavior

No response

System Info

No response

Others

No response

@hiyouga hiyouga added the pending This problem is yet to be addressed label Jan 22, 2024
@muzhi1991
Copy link

请问一下,对于packing的方式(尤其是sft的情况下),除了上面提到的pos,是不是应该设置合适的atten mask,来隔离不同的instance呢?

@DinhLuan14
Copy link

@hiyouga
Has LLama-Factory implemented this 'https://github.com/MeetKai/functionary/tree/main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

@Ricardokevins
Copy link

any update on this issue?
@hiyouga

@chiosChen
Copy link

llama 3也修改了attention mask,但没提position id,position id真的有必要修改吗?rope本身就是相对编码

@lugimzzz
Copy link

请问一下,对于packing的方式(尤其是sft的情况下),除了上面提到的pos,是不是应该设置合适的atten mask,来隔离不同的instance呢?

同样的问题,为什么不考虑处理atten_mask。单纯拼接,后面的数据能看到前面的数据的意义在哪?

@letterk
Copy link

letterk commented Jun 15, 2024

@hiyouga Has LLama-Factory implemented this 'MeetKai/functionary@main/functionary/train/packing#assert-implementation' for Packing yet? I did notice the 'preprocess_packed_supervised_dataset' part of the code in the repo.

The function 'preprocess_packed_supervised_dataset' does not currently implement atten_mask for other instances.

@hiyouga, do you have any plans to add this feature in the future?

@hiyouga
Copy link
Owner

hiyouga commented Jun 15, 2024

@letterk will be fixed after merging #4224

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
8 participants