We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running tokenizer on dataset (num_proc=48): 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1621781/1621781 [07:42<00:00, 3507.62 examples/s]
上面步骤执行完成后,又会执行:
Running tokenizer on dataset (num_proc=48): 37%|█████████████████████████████████████████████▋ | 607148/1621781 [00:14<00:06, 145837.70 examples/s]
两次数据集数量相同,耗时接近,感觉重复执行了。
The text was updated successfully, but these errors were encountered:
fixed
Sorry, something went wrong.
6baafd4
@hiyouga 请问原因2次的原因是什么呢?我看改动里面主要是training_args.local_process_index这个吗?
No branches or pull requests
上面步骤执行完成后,又会执行:
两次数据集数量相同,耗时接近,感觉重复执行了。
The text was updated successfully, but these errors were encountered: