-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different results between training and eval #40
Comments
Yes, I've encountered this problem. For this reason I always report the numbers that are reproducible based on the saved checkpoints and never those during training. |
@eyuansu62 something I noticed: are you aware that your exact match and exec accuracies are identical? That doesn't seem right, have you made modifications to that code? |
Another thought: the content matching code I borrowed from Victoria Lin et al's BRIDGE model does not necessarily produce the same column values between runs. This instability can explain the discrepancy partially but not fully. If you like to stare at diffs, try comparing the |
I do not modify the metric code. And the same result seems to be a coincidence in 2304 epoch. Because there is:
|
Recently, I carefully compare the difference between training and evaluation. There are many kinds of error, such as key word error: asc, desc, wrong table name, wrong column name, etc. |
Sorry to bother you! But I find another interesting problem.
When I start training (with train.json) and get a result in the middle process such as:
It can be seen that eval_exact_match is around 0.64.
But if I run evaluation mode (with eval.json), I will get:
The eval_exact_match is around 0.62
And the eval.json is
It is different about 2%. Have you ever seen its problem?
The text was updated successfully, but these errors were encountered: