Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About calculating the slot f1 metric #13

Closed
yichaopku opened this issue Jun 13, 2022 · 3 comments
Closed

About calculating the slot f1 metric #13

yichaopku opened this issue Jun 13, 2022 · 3 comments

Comments

@yichaopku
Copy link

when calculating the slot metric, the parameter "labels_ignore" of the function
def eval_preds(pred_intents=None, lab_intents=None, pred_slots=None, lab_slots=None,
eval_metrics='all', labels_ignore='Other', labels_merge=None, pad='Other',
slot_level_combination=True)

is set as "Other". This result in the case, eg. label: what is the weather [datetime: today] prediction: what [datetime: is the weather today] treated as a correct prediction.
Whether this is by design or a mistake? If this is a mistake, could someone please update the compute_metrics code used for online evaluation and the baseline metric values in the competition webpage and the leaderboard?

@ihungalexhsu
Copy link

Same question here.
Also, when measuring F1 score, if a slot is n token, the current measurement will calculate it n times.

@jgmf-amazon
Copy link
Contributor

Hi @yichaopku (and @ihungalexhsu ), this is indeed a (major) bug. Thank you very much for finding this.

Please see the PR here: #14

@ihungalexhsu
Copy link

Hi @jgmf-amazon, thanks for your reply. However, the current evaluation code contains some issue:

def convert_to_bio(seq_tags, outside='Other', labels_merge=None):
    """
    Converts a sequence of tags into BIO format. EX:

        ['city', 'city', 'Other', 'country', -100, 'Other']
        to
        ['B-city', 'I-city', 'O', 'B-country', 'I-country', 'O']
        where outside = 'Other' and labels_merge = [-100]

    :param seq_tags: the sequence of tags that should be converted
    :type seq_tags: list
    :param outside: The label(s) to put outside (ignore). Default: 'Other'
    :type outside: str or list
    :param labels_merge: The labels to merge leftward (i.e. for tokenized inputs)
    :type labels_merge: str or list
    :return: a BIO-tagged sequence
    :rtype: list
    """

    seq_tags = [str(x) for x in seq_tags]

    outside = [outside] if type(outside) != list else outside
    outside = [str(x) for x in outside]

    if labels_merge:
        labels_merge = [labels_merge] if type(labels_merge) != list else labels_merge
        labels_merge = [str(x) for x in labels_merge]
    else:
        labels_merge = []

    bio_tagged = []
    prev_tag = None
    for tag in seq_tags:
        if tag in outside:
            bio_tagged.append('O')
            prev_tag = tag
            continue
        if tag != prev_tag and tag not in labels_merge:
            bio_tagged.append('B-' + tag)
            prev_tag = tag
            continue
        if tag == prev_tag or tag in labels_merge:
            if prev_tag in outside:
                bio_tagged.append('O')
            else:
                bio_tagged.append('I-' + prev_tag)

    return bio_tagged

The current code will meet a bug when:
prev_tag is None and tag is -100.
This will not happen if model is properly trained, but when train the model at the beginning stage, the model might output this silly combination.

Maybe a potential solution is not initialized prev_tag with None, but using 'O'?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants