Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sarkar/Add support for max_length in run_generation #476

Closed
wants to merge 5 commits into from

Conversation

ssarkar2
Copy link
Collaborator

@ssarkar2 ssarkar2 commented Oct 19, 2023

What does this PR do?

semi fix for this issue

Add support for max_length (as an alternative for max_new_tokens). The change in utils.py is needed if someone specifies max_length instead of max_new_tokens. The change in run_generation.py is to activate and test the max_length path

The issue itself is not fixed fully, but it moves past the error mentioned in the issue.

Tests:

runs:

only max_new_tokens is specified

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8

only max_length is specified

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8 --max_length 100

only max_length is specified, with bucket

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8 --max_length 100 --bucket_size 30

only max_new_tokens is specified, with bucket

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --bucket_size 30

dataset run with max_length

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8 --max_length 100 --bucket_size 30 --dataset_name squad --column context --dataset_max_samples 32

both are not specified (max_new_tokens defaults to 100, old back compat behaviour)

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8

(expected) fails

both are specified

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --max_length 108

pytest mentioned in the issue:

cd optimum-habana/tests/transformers/tests/models/gpt2
python -m pytest -vs test_modeling_gpt2.py::GPT2ModelTest::test_beam_search_generate
The original error is gone now. There is new error

"""
if sent_lengths[i] < sent_max_len:
# inserting only the first eos_token_id

          decoded[i, sent_lengths[i]] = eos_token_id[0]

E TypeError: 'NoneType' object is not subscriptable

"""

However not sure if beam search is tested/supported/prioritized for GPT2

FWIW, beam search generations with run_generation.py with opt/gpt2 using max_length/max_new_tokens pass:
python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --num_beams 2
python run_generation.py --model_name_or_path gpt2 --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --num_beams 2
python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_length 100 --batch 8 --num_beams 2
python run_generation.py --model_name_or_path gpt2 --use_hpu_graphs --use_kv_cache --bf16 --max_length 100 --batch 8 --num_beams 2

Fixes # (472)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 19, 2023

The documentation is not available anymore as the PR was closed or merged.

@ssarkar2
Copy link
Collaborator Author

@regisss @libinta @ankurneog could you please review this PR

@@ -211,6 +221,8 @@ def main():
)

args = parser.parse_args()
if args.max_length is None and args.max_new_tokens is None:
args.max_new_tokens = 100
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep back compatibility. In current case we can not specify --max_new_tokens and it would ge a default of 100

Since it was optional I have: length_group = parser.add_mutually_exclusive_group(required=False)

However to maintain back-compat if its not specified, I put default=100

Note I cannot put the default value in " length_group.add_argument(", because in the case where --max_length is specified and --max_new_tokens isnt, arg parser will assign it default of 100, which will result in both max_length and max_new_tokens getting values
Hence teh default is set here, outside argparser

Copy link
Collaborator

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't modify the text-generation example and just keep the change in generation.py. This example doesn't have to enable us to reproduce all use cases.

Then, after that, the test leads to an error with eos_token_id being None. I don't have a good understanding why yet as it doesn't happen in Transformers and I don't have the bandwidth to look more into it. Let's skip it for the moment.

@ssarkar2 ssarkar2 closed this Oct 25, 2023
@ssarkar2
Copy link
Collaborator Author

closing for now

@regisss
Copy link
Collaborator

regisss commented Oct 26, 2023

The change to generate to be able to use max_length is good to me though 😄
I can open a PR to do it or let me know @ssarkar2 if you prefer doing it yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants