Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ABSA] Predict with a gold aspect dataset #469

Merged
merged 3 commits into from
Jan 11, 2024

Conversation

tomaarsen
Copy link
Member

Hello!

Pull Request overview

  • Allowing predicting polarities with a gold aspect dataset
  • Tests

Details

Usage

>>> from setfit import AbsaModel
>>> model = AbsaModel.from_pretrained(
...     "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect",
...     "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity",
... )
>>> dataset = load_dataset("tomaarsen/setfit-absa-semeval-restaurants", split="train")
>>> model.predict(dataset)
Dataset({
    features: ['text', 'span', 'label', 'ordinal', 'pred_polarity'],
    num_rows: 3693
})

As requested, I've added support for predicting polarities given gold aspect spans. Note that there's a chance that the gold aspect spans don't correspond with tokens by spaCy. In this case, a warning is thrown & these cases are ignored. In the output, these are
returned as None.

from setfit import AbsaModel
from datasets import load_dataset

model = AbsaModel.from_pretrained(
    "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-aspect",
    "tomaarsen/setfit-absa-bge-small-en-v1.5-restaurants-polarity",
)
dataset = load_dataset("tomaarsen/setfit-absa-semeval-restaurants", split="train")

output = model.predict(dataset)
df = output.to_pandas()
print(sum(df["pred_polarity"] == None))
# => 9

with these warnings:

Aspect term 'Ambiance' with ordinal 0, isn't a token in 'Ambiance- relaxed and stylish.' according to spaCy. Skipping this sample.
Aspect term 'priced' with ordinal 0, isn't a token in 'Great wine list, reasonably priced.--Sara' according to spaCy. Skipping this sample.
Aspect term 'quality' with ordinal 0, isn't a token in '$20 gets you unlimited sushi of a very high quality- I even took a friend here from Japan who said it was one of the best sushi places in the US that he has been to.' according to spaCy. Skipping this sample.
Aspect term 'Service' with ordinal 0, isn't a token in 'Service- friendly and attentive.' according to spaCy. Skipping this sample.
Aspect term 'dress cod' with ordinal 0, isn't a token in 'Good atmosphere, combination of all the hottest music dress code is relatively strict except on Fridays.' according to spaCy. Skipping this sample.
Aspect term 'brunch' with ordinal 0, isn't a token in 'We had a 3 hour brunch- they definitely do not rush you- and they kept the unlimited mimosas flowing the whole time.' according to spaCy. Skipping this sample.
Aspect term 'chicken in curry sauc' with ordinal 0, isn't a token in 'Interesting other dishes for a change include chicken in curry sauce and salmon caserole.' according to spaCy. Skipping this sample.
Aspect term 'mascarpone with chocolate chip' with ordinal 0, isn't a token in 'They should have called it mascarpone with chocolate chips-good but a far cry from what the name implies.' according to spaCy. Skipping this sample.
Aspect term 'lunch' with ordinal 0, isn't a token in 'I have to say that if this what makes it easier to get a saet a lunch- I dont mind.' according to spaCy. Skipping this sample.

You can then (more) easily do SB2 tests:

...

print(sum(df["label"] == df["pred_polarity"]) / len(df))
# => 0.6206 (for example)
# this is just naive accuracy, you can do f1_score from sklearn

cc: @rlaperdo @MosheWasserb @orenpereg

  • Tom Aarsen

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@tomaarsen
Copy link
Member Author

I got feedback that this PR seems to work correctly. I'll be merging it 🎉

@tomaarsen tomaarsen merged commit 3e3d828 into huggingface:main Jan 11, 2024
18 checks passed
@tomaarsen tomaarsen deleted the absa_predict_gold_aspects branch January 11, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants