add eval dataset support #4691

codemayq · 2024-07-05T08:02:17Z

merge load dataset and split dataset function

What does this PR do?

add custom eval dataset support during training
reconstruct the load dataset and split dataset functions

fix #2290
fix #3974

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

2. merge load dataset and split dataset function

hiyouga

LGTM

dqgdqg · 2024-07-14T22:09:40Z

Thanks for the valuable update. But now it is a little bit confusing.

Could you please give an example of **.yaml configuration to specify different train and val datasets instead of val_size?

hiyouga · 2024-07-14T22:12:14Z

Thanks for the valuable update. But now it is a little bit confusing.

Could you please give an example of **.yaml configuration to specify different train and val datasets instead of val_size?

remove val_size and add eval_dataset

dqgdqg · 2024-07-17T04:20:30Z

It works. Thanks.

Wolverhampton0 · 2024-08-02T15:55:19Z

还是报错：ValueError: Some keys are not used by the HfArgumentParser: ['eval_dataset']是为何？需要更新环境吗？

1. add custom eval dataset support

76f3bbc

2. merge load dataset and split dataset function

hiyouga self-requested a review July 13, 2024 15:40

Update README.md

9d64507

hiyouga had a problem deploying to tests July 14, 2024 13:27 — with GitHub Actions Failure

Update parser.py

3d39d74

hiyouga had a problem deploying to tests July 14, 2024 15:04 — with GitHub Actions Failure

Update loader.py

a5b8095

hiyouga had a problem deploying to tests July 14, 2024 16:50 — with GitHub Actions Failure

Update data_utils.py

97a0e29

hiyouga had a problem deploying to tests July 14, 2024 16:54 — with GitHub Actions Failure

Update parser.py

84e4047

hiyouga had a problem deploying to tests July 14, 2024 16:55 — with GitHub Actions Failure

Update preprocess.py

df52fb0

hiyouga had a problem deploying to tests July 14, 2024 16:55 — with GitHub Actions Failure

Update data_args.py

cba673f

hiyouga had a problem deploying to tests July 14, 2024 16:56 — with GitHub Actions Failure

hiyouga approved these changes Jul 14, 2024

View reviewed changes

hiyouga merged commit 15b399a into hiyouga:main Jul 14, 2024
1 check failed

hiyouga mentioned this pull request Oct 18, 2024

Cannot manually assign eval dataset during sft training #5740

Closed

1 task

hiyouga mentioned this pull request Oct 29, 2024

如何指定已划分好的训练集和验证集？ #4451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add eval dataset support #4691

add eval dataset support #4691

codemayq commented Jul 5, 2024 •

edited by hiyouga

Loading

hiyouga left a comment

dqgdqg commented Jul 14, 2024

hiyouga commented Jul 14, 2024 •

edited

Loading

dqgdqg commented Jul 17, 2024

Wolverhampton0 commented Aug 2, 2024

add eval dataset support #4691

add eval dataset support #4691

Conversation

codemayq commented Jul 5, 2024 • edited by hiyouga Loading

What does this PR do?

Before submitting

hiyouga left a comment

Choose a reason for hiding this comment

dqgdqg commented Jul 14, 2024

hiyouga commented Jul 14, 2024 • edited Loading

dqgdqg commented Jul 17, 2024

Wolverhampton0 commented Aug 2, 2024

codemayq commented Jul 5, 2024 •

edited by hiyouga

Loading

hiyouga commented Jul 14, 2024 •

edited

Loading