-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About calculating the slot f1 metric #13
Comments
Same question here. |
Hi @yichaopku (and @ihungalexhsu ), this is indeed a (major) bug. Thank you very much for finding this. Please see the PR here: #14 |
Hi @jgmf-amazon, thanks for your reply. However, the current evaluation code contains some issue:
The current code will meet a bug when: Maybe a potential solution is not initialized prev_tag with None, but using 'O'? |
when calculating the slot metric, the parameter "labels_ignore" of the function
def eval_preds(pred_intents=None, lab_intents=None, pred_slots=None, lab_slots=None,
eval_metrics='all', labels_ignore='Other', labels_merge=None, pad='Other',
slot_level_combination=True)
is set as "Other". This result in the case, eg. label: what is the weather [datetime: today] prediction: what [datetime: is the weather today] treated as a correct prediction.
Whether this is by design or a mistake? If this is a mistake, could someone please update the compute_metrics code used for online evaluation and the baseline metric values in the competition webpage and the leaderboard?
The text was updated successfully, but these errors were encountered: