Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

michalkulakowski · 2024-12-20T09:11:55Z

… generation

mzegla · 2024-12-20T09:42:29Z

src/cpp/src/sampler.hpp

@@ -105,7 +105,7 @@ class Sampler::GroupBeamSearcher {
        bool done = false;

        int64_t finish(Beam beam, const ov::genai::GenerationConfig& sampling_params);
-        void is_done(const ov::genai::GenerationConfig& sampling_params);
+        void is_done(const ov::genai::GenerationConfig& sampling_params, size_t prompt_len);


I think we can include prompt_len information in Group members during object construction and avoid passing it as a parameter in this method.

ilya-lavrenov

Not all places inside src/cpp are changed

mzegla · 2024-12-20T09:48:11Z

One more thing - I believe max_length is now loaded from generation config. Isn't it a model property that is not meant to be a per generation configuration? @michalkulakowski I know you have logic to read that value in OVMS now. Maybe we could move it here and make it a pipeline member. This way it could be used in both OVMS and standalone GenAI app.

michalkulakowski · 2025-01-02T09:52:21Z

One more thing - I believe max_length is now loaded from generation config. Isn't it a model property that is not meant to be a per generation configuration? @michalkulakowski I know you have logic to read that value in OVMS now. Maybe we could move it here and make it a pipeline member. This way it could be used in both OVMS and standalone GenAI app.

That makes sense to me. @ilya-lavrenov what do you think?

ilya-lavrenov · 2025-01-04T08:23:33Z

One more thing - I believe max_length is now loaded from generation config. Isn't it a model property that is not meant to be a per generation configuration? @michalkulakowski I know you have logic to read that value in OVMS now. Maybe we could move it here and make it a pipeline member. This way it could be used in both OVMS and standalone GenAI app.

I suppose it depends on the model:

For example for meta-llama/Meta-Llama-3-8B-Instruct we have this value in generation config https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/generation_config.json#L6
For other models like default value for max_length is used, which is 20. See https://github.com/huggingface/transformers/blob/e5fd865ebae062b7cf03a81b8c6affeb39f30bec/src/transformers/generation/configuration_utils.py#L127-L129

Looks like max_model_length (which is config.max_position_embeddings, example is https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/config.json#L13) and max_length from generation_config.json are different things, are not they?

Maybe we can have similar behavior for GenAI and add some defaults similar to HF?

@Wovchena @pavel-esir @as-suvorov what is your opinion?

github-actions bot added the category: sampling Sampling / Decoding algorithms label Dec 20, 2024

Use get_max_new_tokens() insted of max_new_tokens field when stopping…

5d527f0

… generation

michalkulakowski force-pushed the mkulakow/max_length branch from 402bba1 to 5d527f0 Compare December 20, 2024 09:14

mzegla reviewed Dec 20, 2024

View reviewed changes

ilya-lavrenov requested changes Dec 20, 2024

View reviewed changes

ilya-lavrenov assigned ilya-lavrenov, Wovchena and mzegla Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

michalkulakowski commented Dec 20, 2024

mzegla Dec 20, 2024

ilya-lavrenov left a comment

mzegla commented Dec 20, 2024

michalkulakowski commented Jan 2, 2025

ilya-lavrenov commented Jan 4, 2025 •

edited

Loading

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

Are you sure you want to change the base?

Use get_max_new_tokens() insted of max_new_tokens field when stopping… #1417

Conversation

michalkulakowski commented Dec 20, 2024

mzegla Dec 20, 2024

Choose a reason for hiding this comment

ilya-lavrenov left a comment

Choose a reason for hiding this comment

mzegla commented Dec 20, 2024

michalkulakowski commented Jan 2, 2025

ilya-lavrenov commented Jan 4, 2025 • edited Loading

ilya-lavrenov commented Jan 4, 2025 •

edited

Loading