sarkar/Add support for max_length in run_generation #476

ssarkar2 · 2023-10-19T18:57:20Z

What does this PR do?

semi fix for this issue

Add support for max_length (as an alternative for max_new_tokens). The change in utils.py is needed if someone specifies max_length instead of max_new_tokens. The change in run_generation.py is to activate and test the max_length path

The issue itself is not fixed fully, but it moves past the error mentioned in the issue.

Tests:

runs:

only max_new_tokens is specified

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8

only max_length is specified

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8 --max_length 100

only max_length is specified, with bucket

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8 --max_length 100 --bucket_size 30

only max_new_tokens is specified, with bucket

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --bucket_size 30

dataset run with max_length

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8 --max_length 100 --bucket_size 30 --dataset_name squad --column context --dataset_max_samples 32

both are not specified (max_new_tokens defaults to 100, old back compat behaviour)

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --batch 8

(expected) fails

both are specified

python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --max_length 108

pytest mentioned in the issue:

cd optimum-habana/tests/transformers/tests/models/gpt2
python -m pytest -vs test_modeling_gpt2.py::GPT2ModelTest::test_beam_search_generate
The original error is gone now. There is new error

"""
if sent_lengths[i] < sent_max_len:
# inserting only the first eos_token_id

          decoded[i, sent_lengths[i]] = eos_token_id[0]

E TypeError: 'NoneType' object is not subscriptable

"""

However not sure if beam search is tested/supported/prioritized for GPT2

FWIW, beam search generations with run_generation.py with opt/gpt2 using max_length/max_new_tokens pass:
python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --num_beams 2
python run_generation.py --model_name_or_path gpt2 --use_hpu_graphs --use_kv_cache --bf16 --max_new_tokens 100 --batch 8 --num_beams 2
python run_generation.py --model_name_or_path facebook/opt-350m --use_hpu_graphs --use_kv_cache --bf16 --max_length 100 --batch 8 --num_beams 2
python run_generation.py --model_name_or_path gpt2 --use_hpu_graphs --use_kv_cache --bf16 --max_length 100 --batch 8 --num_beams 2

Fixes # (472)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-10-19T19:08:43Z

The documentation is not available anymore as the PR was closed or merged.

ssarkar2 · 2023-10-19T21:30:11Z

@regisss @libinta @ankurneog could you please review this PR

ssarkar2 · 2023-10-19T22:42:44Z

examples/text-generation/run_generation.py

@@ -211,6 +221,8 @@ def main():
    )

    args = parser.parse_args()
+    if args.max_length is None and args.max_new_tokens is None:
+        args.max_new_tokens = 100


To keep back compatibility. In current case we can not specify --max_new_tokens and it would ge a default of 100

Since it was optional I have: length_group = parser.add_mutually_exclusive_group(required=False)

However to maintain back-compat if its not specified, I put default=100

Note I cannot put the default value in " length_group.add_argument(", because in the case where --max_length is specified and --max_new_tokens isnt, arg parser will assign it default of 100, which will result in both max_length and max_new_tokens getting values
Hence teh default is set here, outside argparser

regisss

I wouldn't modify the text-generation example and just keep the change in generation.py. This example doesn't have to enable us to reproduce all use cases.

Then, after that, the test leads to an error with eos_token_id being None. I don't have a good understanding why yet as it doesn't happen in Transformers and I don't have the bandwidth to look more into it. Let's skip it for the moment.

ssarkar2 · 2023-10-25T16:50:14Z

closing for now

regisss · 2023-10-26T12:14:23Z

The change to generate to be able to use max_length is good to me though 😄
I can open a PR to do it or let me know @ssarkar2 if you prefer doing it yourself.

Add support for max_length in run_generation

ad35a30

ssarkar2 requested review from bhargaveede, vivekgoe and regisss as code owners October 19, 2023 18:57

Style and cleanup

23b328a

ssarkar2 mentioned this pull request Oct 19, 2023

Add support for max_length in run_generation #472

Open

4 tasks

ssarkar2 added 3 commits October 20, 2023 01:14

Make the 2 options mut-ex

4e10475

Print thruput for max_len case as well

595c7ad

Style

54d531c

ssarkar2 commented Oct 19, 2023

View reviewed changes

libinta approved these changes Oct 24, 2023

View reviewed changes

regisss reviewed Oct 25, 2023

View reviewed changes

ssarkar2 closed this Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sarkar/Add support for max_length in run_generation #476

sarkar/Add support for max_length in run_generation #476

ssarkar2 commented Oct 19, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 19, 2023 •

edited

Loading

ssarkar2 commented Oct 19, 2023

ssarkar2 Oct 19, 2023

regisss left a comment

ssarkar2 commented Oct 25, 2023

regisss commented Oct 26, 2023

sarkar/Add support for max_length in run_generation #476

sarkar/Add support for max_length in run_generation #476

Conversation

ssarkar2 commented Oct 19, 2023 • edited Loading

What does this PR do?

Tests:

only max_new_tokens is specified

only max_length is specified

only max_length is specified, with bucket

only max_new_tokens is specified, with bucket

dataset run with max_length

both are not specified (max_new_tokens defaults to 100, old back compat behaviour)

both are specified

pytest mentioned in the issue:

Before submitting

HuggingFaceDocBuilderDev commented Oct 19, 2023 • edited Loading

ssarkar2 commented Oct 19, 2023

ssarkar2 Oct 19, 2023

Choose a reason for hiding this comment

regisss left a comment

Choose a reason for hiding this comment

ssarkar2 commented Oct 25, 2023

regisss commented Oct 26, 2023

ssarkar2 commented Oct 19, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 19, 2023 •

edited

Loading