-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sft_packing实现的问题 #2289
Comments
请问一下,对于packing的方式(尤其是sft的情况下),除了上面提到的pos,是不是应该设置合适的atten mask,来隔离不同的instance呢? |
@hiyouga |
any update on this issue? |
llama 3也修改了attention mask,但没提position id,position id真的有必要修改吗?rope本身就是相对编码 |
同样的问题,为什么不考虑处理atten_mask。单纯拼接,后面的数据能看到前面的数据的意义在哪? |
The function 'preprocess_packed_supervised_dataset' does not currently implement atten_mask for other instances. @hiyouga, do you have any plans to add this feature in the future? |
Reminder
Reproduction
看目前sft_packing的实现只是单纯将不同的单轮sft数据拼接到一起,然后分别计算target部分的loss
def preprocess_packed_supervised_dataset(
examples: Dict[str, List[Any]],
tokenizer: "PreTrainedTokenizer",
template: "Template",
data_args: "DataArguments",
) -> Dict[str, List[List[int]]]:
# build inputs with format
<bos> X1 Y1 <eos> <bos> X2 Y2 <eos>
# and labels with format
<ignore> ... <ignore> Y1 <eos> <ignore> ... <ignore> Y2 <eos>
model_inputs = {"input_ids": [], "attention_mask": [], "labels": []}
这里是不是应该增加对position_ids的修改呢?从而保证每条单轮sft在计算loss的时候不会受到其他拼接的上文影响
Expected behavior
No response
System Info
No response
Others
No response
The text was updated successfully, but these errors were encountered: