Modified evaluation to use seqeval package #14
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue 13
This PR is to fix a bug found in issue 13:
The problem is that only tokens in the predicted utterance aligned with slotted tokens in the ground truth were used when performing evaluation. Additionally, the calculation of F1 was wrong. Exact match accuracy was also affected.
To fix the bug, the
seqeval
package is now used, which follows conlleval conventions by default. To useseqeval
, BIO tagging was implemented. Exact match accuracy was also updated to use the BIO-tagged sequences. Some new test cases were added, as well. All tests pass.The paper preprint and the eval.ai leaderboards will be fixed by 6/17.