Can anyone reproduce the baseline results reported in the paper with the current version of codes? #21

bozheng-hit · 2022-06-21T11:32:48Z

I think the following codes should be inserted here: https://github.com/alexa/massive/blob/main/src/massive/utils/training_utils.py#L472

for i, x in enumerate(lab): 
    if x == -100: 
        pred[i] = -100

The text was updated successfully, but these errors were encountered:

bozheng-hit · 2022-06-21T12:26:58Z

I am able to reproduce the results after adding the above codes.
I strongly suggest the contributors at least run the codes for one time before committing.

jgmf-amazon · 2022-06-22T19:20:37Z

Hi @bozheng-hit , our apologies for the extra debugging that you had to perform, and thanks for proposing a solution. I am able to reproduce this problem, namely that the validation engine gives bad results for encoder-only models when using the new evaluation code. PR here: #22

jgmf-amazon · 2022-06-22T20:32:01Z

Please re-open if you still see the problem. Thank you!

jgmf-amazon mentioned this issue Jun 22, 2022

fixes for validation engine and for using torchrun #22

Merged

jgmf-amazon closed this as completed Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can anyone reproduce the baseline results reported in the paper with the current version of codes? #21

Can anyone reproduce the baseline results reported in the paper with the current version of codes? #21

bozheng-hit commented Jun 21, 2022

bozheng-hit commented Jun 21, 2022

jgmf-amazon commented Jun 22, 2022

jgmf-amazon commented Jun 22, 2022

Can anyone reproduce the baseline results reported in the paper with the current version of codes? #21

Can anyone reproduce the baseline results reported in the paper with the current version of codes? #21

Comments

bozheng-hit commented Jun 21, 2022

bozheng-hit commented Jun 21, 2022

jgmf-amazon commented Jun 22, 2022

jgmf-amazon commented Jun 22, 2022