Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add run_zeroshot.py; add functionality to data.get_templated_dataset() (formerly add_templated_examples()) #292

Merged
merged 21 commits into from
Feb 7, 2023

Conversation

danielkorat
Copy link
Collaborator

@danielkorat danielkorat commented Jan 30, 2023

This PR simplifies running zero shot with SetFit.
The goal is to add a run_zeroshot.py script and remove the function data.get_augmented_samples() since its functionality is included in the generic data.get_templated_dataset().

@danielkorat danielkorat requested a review from tomaarsen January 30, 2023 13:52
@danielkorat
Copy link
Collaborator Author

Still work in progress

@danielkorat danielkorat added the enhancement New feature or request label Jan 30, 2023
@danielkorat danielkorat self-assigned this Jan 30, 2023
@tomaarsen tomaarsen marked this pull request as draft January 31, 2023 08:12
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@tomaarsen
Copy link
Member

tomaarsen commented Feb 1, 2023

@danielkorat
This section:

setfit/tests/test_data.py

Lines 177 to 189 in 5e78e40

@pytest.mark.parametrize(
"dataset",
[
"emotion",
"ag_news",
"amazon_counterfactual_en",
"SentEval-CR",
"sst5",
"enron_spam",
"tweet_eval_stance_abortion",
"ade_corpus_v2_classification",
],
)

is a leftover decorator from the now-removed test_get_augmented_samples. The decorator should also be removed if you want to remove test_get_augmented_samples.

@danielkorat
Copy link
Collaborator Author

@danielkorat This section:

setfit/tests/test_data.py

Lines 177 to 189 in 5e78e40

@pytest.mark.parametrize(
"dataset",
[
"emotion",
"ag_news",
"amazon_counterfactual_en",
"SentEval-CR",
"sst5",
"enron_spam",
"tweet_eval_stance_abortion",
"ade_corpus_v2_classification",
],
)

is a leftover decorator from the now-removed test_get_augmented_samples. The decorator should also be removed if you want to remove test_get_augmented_samples.

@tomaarsen
I re-implemented these tests now.

@danielkorat danielkorat changed the title add run_zeroshot.py; add functionality to data.add_templated_examples() add run_zeroshot.py; add functionality to data.get_templated_dataset() (formerly add_templated_examples()) Feb 1, 2023
@danielkorat danielkorat marked this pull request as ready for review February 1, 2023 17:14
@danielkorat danielkorat requested review from tomaarsen and removed request for tomaarsen February 2, 2023 09:28
Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some small nitpicks, could you have a look at them?

src/setfit/data.py Outdated Show resolved Hide resolved
src/setfit/data.py Outdated Show resolved Hide resolved
src/setfit/data.py Outdated Show resolved Hide resolved
src/setfit/data.py Outdated Show resolved Hide resolved
src/setfit/data.py Outdated Show resolved Hide resolved
src/setfit/data.py Outdated Show resolved Hide resolved
@danielkorat danielkorat requested a review from tomaarsen February 5, 2023 08:46
Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! get_templated_dataset feels pretty intuitive to use.
I'll let @lewtun decide if we need to keep add_templated_examples and add_augmented_samples, which this PR removes, but fully replace the function bodies with a deprecation warning instead. These warnings can point to the new get_templated_dataset. I suspect that the code of some notebooks or tutorials out there might break without explanation otherwise.

@danielkorat
Copy link
Collaborator Author

Looks good to me! get_templated_dataset feels pretty intuitive to use. I'll let @lewtun decide if we need to keep add_templated_examples and add_augmented_samples, which this PR removes, but fully replace the function bodies with a deprecation warning instead. These warnings can point to the new get_templated_dataset. I suspect that the code of some notebooks or tutorials out there might break without explanation otherwise.

get_templated_dataset is backwards compatible with add_templated_examples. So it's just a name change. add_templated_examples was only used in tests and in the zero-shot-classification.ipynb notebook. I refactored these files accordingly, so there is no mention of add_templated_examples anymore. We can add a line like:

add_templated_examples = get_templated_dataset

with a deprecation warning. What do you think @tomaarsen ?

@tomaarsen
Copy link
Member

A deprecation warning would be great. Could you include a suggestion of the new function to use as well as that the old function will be removed in v1.0.0?

Copy link
Member

@tomaarsen tomaarsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some tiny last nitpicks

src/setfit/data.py Outdated Show resolved Hide resolved
src/setfit/data.py Show resolved Hide resolved
@danielkorat danielkorat requested a review from tomaarsen February 7, 2023 10:09
@danielkorat
Copy link
Collaborator Author

@tomaarsen please review

@tomaarsen
Copy link
Member

Thank you @danielkorat for sticking with me through my many suggestions. The script is very useful, as is the new and improved get_templated_dataset(). I'm very impressed at the results of zeroshot as well: it is very consistent in its results compared to 8-shot, and zeroshot seems to outperform 8-shot in various benchmarks (e.g. SetFit/emotion).

@tomaarsen tomaarsen merged commit b90fdc5 into huggingface:main Feb 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants