-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate on the test set #11
Comments
Hi, thanks for noticing that. There should be no worry on the data leakage. As you can see in the trainer file, the definition of the Validation Dataset (nn.Dataset) does not determine the data used in the validation step. There are three masks that control the data used for validation, namely The reason why the There should be no doubt on the improvements because:
Anyway, this line of the code is indeed suspicious, I will fix that and leave a notificaiton on the home page. Many thanks for mentioning that! |
I see. The obj_metric is independent of the test_dataset here because of the mask. Thank you for the quickly explanation. |
ATST-SED/train/train_stage2.py
Line 213 in 0ac8073
I am reproducing the result of the ATST-SED. When I was doing stage 2, I realized that the test set already leaked into the valid set.
Is this designed as proposed? I cannot find this part of the explanation in the paper. Did you also keep this setting of train/valid/test split into the baseline BEATs model, otherwise I doubt whether the improvement is from it.
The text was updated successfully, but these errors were encountered: