Properly remove eos_token in llama3 tokenizer if requested by user #1477

joecummings · 2024-09-03T18:19:10Z

Context

What is the purpose of this PR? Is it to

add a new feature
fix a bug
update tests and/or documentation
other (please add here)

Changelog

Adds a test to check if we actually strip eos_token if requested by user
Pass in 'None' for truncate if add_eos=False
$$$

Test plan

Please make sure to do each of the following if applicable to your PR. (If you're not sure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.)

run pre-commit hooks and linters (make sure you've first installed via pre-commit install)
add unit tests for any new functionality
update docstrings for any new or updated methods or classes
run unit tests via pytest tests
run recipe tests via pytest tests -m integration_test
manually run any new or modified recipes with sufficient proof of correctness
include relevant commands and any other artifacts in this summary (pastes of loss curves, eval results, etc.)

UX

If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Example of docstring:

torchtune/torchtune/modules/vision_transformer.py

Line 285 in 6a7951f

Examples:

Example in our docs: https://pytorch.org/torchtune/main/tutorials/qat_finetune.html#applying-qat-to-llama3-models

I did not change any public API;
I have added an example to docs or docstrings;

pytorch-bot · 2024-09-03T18:19:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1477

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1e0501e with merge base 71be8ad ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

codecov-commenter · 2024-09-03T18:41:22Z

Codecov Report

Attention: Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.

Project coverage is 27.01%. Comparing base (71be8ad) to head (1e0501e).
Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
...s/torchtune/models/llama3/test_llama3_tokenizer.py	14.28%	6 Missing ⚠️
torchtune/models/llama3/_tokenizer.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1477       +/-   ##
===========================================
- Coverage   72.25%   27.01%   -45.25%     
===========================================
  Files         274      274               
  Lines       13278    13306       +28     
===========================================
- Hits         9594     3594     -6000     
- Misses       3684     9712     +6028

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Properly remove eos_token in llama3 tokenizer if requested by user

1e0501e

joecummings added bug Something isn't working testing labels Sep 3, 2024

joecummings requested review from ebsmothers, pbontrager and RdoubleA September 3, 2024 18:19

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 3, 2024

joecummings mentioned this pull request Sep 3, 2024

[Qwen2 tokenizer] Ensure eos_token is not added if add_eos=False #1478

Closed

RdoubleA approved these changes Sep 3, 2024

View reviewed changes

This was referenced Sep 3, 2024

[Mistral tokenizer] Ensure eos_token is not added if add_eos=False #1479

Closed

[Gemma tokenizer] Ensure eos_token is not added if add_eos=False #1480

Closed

[Phi3 tokenizer] Ensure eos_token is not added if add_eos=False #1481

Closed

joecummings merged commit 26302ac into main Sep 3, 2024
20 checks passed

joecummings deleted the fix-1474 branch September 3, 2024 18:44

krammnic mentioned this pull request Oct 11, 2024

Fix eos_token problem in all required models #1806

Merged

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly remove eos_token in llama3 tokenizer if requested by user #1477

Properly remove eos_token in llama3 tokenizer if requested by user #1477

joecummings commented Sep 3, 2024

pytorch-bot bot commented Sep 3, 2024 •

edited

Loading

codecov-commenter commented Sep 3, 2024

Properly remove eos_token in llama3 tokenizer if requested by user #1477

Properly remove eos_token in llama3 tokenizer if requested by user #1477

Conversation

joecummings commented Sep 3, 2024

Context

Changelog

Test plan

UX

pytorch-bot bot commented Sep 3, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1477

✅ No Failures

codecov-commenter commented Sep 3, 2024

Codecov Report

pytorch-bot bot commented Sep 3, 2024 •

edited

Loading