Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can anyone reproduce the baseline results reported in the paper with the current version of codes? #21

Closed
bozheng-hit opened this issue Jun 21, 2022 · 3 comments

Comments

@bozheng-hit
Copy link

I think the following codes should be inserted here: https://github.com/alexa/massive/blob/main/src/massive/utils/training_utils.py#L472

for i, x in enumerate(lab): 
    if x == -100: 
        pred[i] = -100
@bozheng-hit
Copy link
Author

I am able to reproduce the results after adding the above codes.
I strongly suggest the contributors at least run the codes for one time before committing.

@jgmf-amazon
Copy link
Contributor

Hi @bozheng-hit , our apologies for the extra debugging that you had to perform, and thanks for proposing a solution. I am able to reproduce this problem, namely that the validation engine gives bad results for encoder-only models when using the new evaluation code. PR here: #22

@jgmf-amazon
Copy link
Contributor

Please re-open if you still see the problem. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants