Add keras_nlp.samplers #563

chenmoneygithub · 2022-12-10T01:33:56Z

Base sampler + GreedySampler

For how this sampler is actually used, please see this link: https://colab.sandbox.google.com/gist/chenmoneygithub/5a1204a1888b1b56e37c3fa8101044b3/gpt2-generation-github-branch-version.ipynb

And here is a demo PR: #592

jbischof

Thanks @chenmoneygithub! Some early comments:

Are we making a new keras_nlp.samplers API? Cool!
Why aren't we importing utils/text_generation.py?
You can use @mattdangerw's cool docstring decorator for formatting (link)

keras_nlp/samplers/greedy_sampler.py

chenmoneygithub · 2022-12-12T19:14:32Z

@jbischof Thanks!

Are we making a new keras_nlp.samplers API? Cool!

Yes! It's cleaner and more visible than our current offerings, and more importantly, it suits the generation task of pretrained models better.

Why aren't we importing utils/text_generation.py?

Currently we have a few standalone functions, but wrapping them into class suit the generation task better, and also is more readable and cleaner. Talking to Francois, we want these samplers to have their own namespace, that's why we do the move.

You can use @mattdangerw's cool docstring decorator for formatting (link)

Thanks!

mattdangerw · 2022-12-12T22:11:42Z

Thanks! Will take a proper pass soon!

Re docstrings, I would say we should just replicate the repeated bits of docstring everywhere, so remove all the fancy formatting. The reason to do a format docstring is where we have something programatic we need to get in the docstrings. But otherwise I think Keras as a whole would favor the simplicity of easily readable docstring blocks over the brevity of abstracting out certain argument sections.

mattdangerw · 2023-01-04T00:35:09Z

Some high level thoughts on this...

1. Usage

I am somewhat tempted to keep the samplers as generic as possible and not require any knowledge of special tokens. After all, there are no real standards here (e.g. GPT2 does not have a pad token really).

For inputs we could assume either a ragged input or ask users to pass their own mask. For outputs we could just return the entire sequence_length sequence, and if user wants to then strip all tokens after the first eos token, that could be done as a post process. Then we could just expose the "core" sampling algorithm in as generic a way as possible, while still covering an end-to-end workflow via generate(). Curious peoples thoughts on this.

2. Naming

Should we ditch Sampler in the class name? I find this pretty readable and more concise...

keras_nlp.samplers.Greedy()
keras_nlp.samplers.Beam()
keras_nlp.samplers.Random()
keras_nlp.samplers.TopK()

We only did this repetitive naming for tokenizers because some of the class names looked weird, e.g. tokenizers.Byte. But I don't think we have that problem here! cc @fchollet in case any thoughts.

3. Serialization

I know we will have no direct need to serialize these in our own usage, but registering these objects as serializable and supporting from_config() seems like a consistent thing to do at a library level.

That is, even though we will not save these directly on a model, someone else should be able to for their own custom code (same as keras.optimizers, keras.activations, keras.initializers). Part of exposing these as a top-level keras_nlp.samplers to me means we should conform to Keras wide expectations around serialization.

keras_nlp/samplers/greedy_sampler.py

mattdangerw · 2023-01-04T00:48:51Z

keras_nlp/samplers/sampler.py

+
+
+class Sampler:
+    """Base sampler class.


I might document this class explaining how it could be subclasses, similar to Tokenizer. Even though we are not exposing it right now, we could in the future, and explaining it's intended usage would be helpful.

I can do this as a followup after we are satisfied with the API design. At this time we are still making API changes, so the guide will be unreliable.

Sounds good! But I would just cut this docstring down then. If we aren't ready to describe subclassing, we probably should not put the base class in the release (which seems totally fine).

So we should focus our actual runnable example documentation of the exported symbols themselves.

keras_nlp/samplers/greedy_sampler.py

jbischof · 2023-01-04T18:51:16Z

keras_nlp/samplers/greedy_sampler.py

+
+    Examples:
+    ```python
+    BATCH_SIZE = 8


I don't see most of these params used more than once so you can inline them to match our other docstrings.

This is just one style - some people prefer to define macros in one place, this is the same as our util docstring, e.g., link

utils.text_generation has a lot of style weirdness I've been cleaning up. I'd prefer the args inlined

I got your idea! I am open to both solutions, sharing my opinion here: to me it's a bit different - defining all constants/hypers before usage is cleaner to me, like in our guide: https://keras.io/examples/nlp/neural_machine_translation_with_keras_nlp/#setup, where EPOCHS is only used once.

jbischof · 2023-01-04T18:52:23Z

keras_nlp/samplers/greedy_sampler.py

+    ):
+        super().__init__(end_token_id, pad_token_id, jit_compile)
+
+    def sample(self, token_probability_fn, prompt, mask, num_steps):


Could this method call our util functions? Otherwise we should probably delete them.

Yea, I am going to delete those utils and only keep the sampler class.

Why not delete as part of this PR?

usually I prefer a pure clean-up PR to remove all legacy code to keep the history clean.

chenmoneygithub · 2023-01-04T19:05:21Z

@mattdangerw Thx for the comments! Sharing my thoughts:

Usage

My take is truncating by "end_token" is pretty common in real applications, so I would love to have the generator/generate() to have the truncating functionality.

Naming

Yes, I have struggled a bit about if to keep Sampler in the name. On the one hand I feel "GreedySampler" sounds better than "Greedy", since the latter one is a bit weird by itself. On the other hand I agree "Sampler" is a duplication.

The short answer is I am not sure, we can leave the naming call for Francois, and focus on other parts for now I guess. It's easy to do the renaming.

Serialization

I was also hesitating on this one. Our proposed usage is this "Sampler" class is only to be used with method "generate()", while optimizer/initializer/activations are part of the model serialization. Also users can use the restored optimizer/initializer/activations directly, but they always need to reconstruct a "Sampler" class.

Also I am curious about how "Sampler" serialization could happen - which code will be calling the get_config method? For optimizer, there are custom saving code in core Keras handling it, my guess it's similar for activations/initializers. If we simply attach the "Sampler" to a model, will saving be automatically handled?

mattdangerw · 2023-01-04T23:09:54Z

My take is truncating by "end_token" is pretty common in real applications.

We should definitely have a way to do it. I am ok to have it as a call time argument, where if it is passed, we return a truncated ragged, and if it is not passed, we return a dense. I worry that this polymorphic return might be a little confusing.

Another possibility is to handle that during detokenization? Then we could always return a simple dense with sequence_length shape. But IDK!

token_ids = sampler(token_ids, mask, prob_fn)
tokenizer.detokenize(token_ids, truncate_after=tokenizer.token_to_id("<eos>"))

Padding token is also interesting to me. If we are taking in a dense, and passing both a prompt and a mask to the callback, it seems frankly clearer to ask a user to pass this in.

def next_token_fn(prompt, mask):
   return # Do stuff

prompt = tokenizer(inputs)
mask = prompt != tokenizer.token_to_id("<pad>")
# These seems quite readable, in that the sampler and callback have similar signatures!
sampler(prompt, mask)

keras_nlp/samplers/greedy_sampler.py

keras_nlp/samplers/sampler.py

keras_nlp/samplers/__init__.py

mattdangerw

Left another pass on mostly minor comments!

keras_nlp/samplers/__init__.py

keras_nlp/samplers/greedy.py

mattdangerw · 2023-01-07T01:08:59Z

keras_nlp/samplers/greedy.py

+
+    def __init__(
+        self,
+        jit_compile=True,


give that we talked about moving compilation to model.generate, I think we can remove all the jit_compile stuff here.

we cannot unfortunately, not all of the sampler's __call__ is XLA-compatible. The prompt preprocessing part, because it changes the shape, is not XLA-compatible. The XLA-compatible part is the sample method, which is the part actually benefits from XLA.

keras_nlp/samplers/greedy.py

mattdangerw · 2023-01-07T01:18:37Z

keras_nlp/samplers/sampler.py

+
+
+class Sampler:
+    """Base sampler class.


Sounds good! But I would just cut this docstring down then. If we aren't ready to describe subclassing, we probably should not put the base class in the release (which seems totally fine).

So we should focus our actual runnable example documentation of the exported symbols themselves.

keras_nlp/samplers/sampler.py

mattdangerw · 2023-01-07T01:19:21Z

keras_nlp/samplers/sampler.py

+
+        # Convert `sample` method to a `tf.function`, and turn on
+        # `jit_compile` accordingly.
+        sample = tf.function(self.sample, jit_compile=self.jit_compile)


remove compilation

Same as above. Let's keep the tf.function on sample() for now.

keras_nlp/samplers/sampler_test.py

keras_nlp/samplers/__init__.py

jbischof · 2023-01-09T18:56:12Z

keras_nlp/samplers/sampler.py

+
+    prompt = tf.fill((BATCH_SIZE, 1), START_ID)
+
+    sampler = keras_nlp.samplers.Greedy()


Good point. Now that we're introducing a lot of abstract base classes (ones no one should use directly), we may want to start noting at the top of docstrings e.g., "An abstract base class for samplers." I could imagine the same for Backbone and Preprocessor

jbischof · 2023-01-09T18:58:05Z

keras_nlp/samplers/greedy.py

+        # are aligned to the right side, the index is the same for all.
+        current_index = max_length - num_steps
+
+        def one_step(current_index, prompt, mask):


Could we factor out one_step like train_step in keras.Model? Might improve encapsulation.

That's actually a good idea, the issue is the customization of child samplers classes happen at sample() level, because before the while_loop, there are some custom code required, e.g., beam needs to constructs the beam before the loop.

We can expose an abstract method sample_step() in base class as well, in which case we will have two abstract methods sample() and sample_step() to override, which looks a bit redundant.

keras_nlp/samplers/sampler.py

mattdangerw

Approval from me, granted we might need to change how we handle compilation down the road pending some more discussion.

jbischof

Thanks for the great work!

chenmoneygithub requested review from jbischof and mattdangerw December 10, 2022 01:35

jbischof reviewed Dec 12, 2022

View reviewed changes

keras_nlp/samplers/greedy_sampler.py Outdated Show resolved Hide resolved

chenmoneygithub force-pushed the text-generation-class branch from f0a874c to 6a8b59b Compare December 13, 2022 05:36

mattdangerw requested changes Jan 4, 2023

View reviewed changes

jbischof reviewed Jan 4, 2023

View reviewed changes

mattdangerw reviewed Jan 5, 2023

View reviewed changes

keras_nlp/samplers/greedy_sampler.py Outdated Show resolved Hide resolved

chenmoneygithub force-pushed the text-generation-class branch from 6a8b59b to 65228f3 Compare January 5, 2023 20:16

jbischof reviewed Jan 5, 2023

View reviewed changes

keras_nlp/samplers/sampler.py Outdated Show resolved Hide resolved

jbischof reviewed Jan 5, 2023

View reviewed changes

keras_nlp/samplers/__init__.py Show resolved Hide resolved

chenmoneygithub force-pushed the text-generation-class branch from 5e049ad to 8a8f940 Compare January 6, 2023 22:47

mattdangerw requested changes Jan 7, 2023

View reviewed changes

chenmoneygithub mentioned this pull request Jan 8, 2023

Add contrastive search to our Sampler collection #644

Closed

jbischof reviewed Jan 9, 2023

View reviewed changes

mattdangerw requested changes Jan 9, 2023

View reviewed changes

keras_nlp/samplers/sampler.py Show resolved Hide resolved

keras_nlp/samplers/sampler.py Outdated Show resolved Hide resolved

chenmoneygithub force-pushed the text-generation-class branch from f84235f to caa6754 Compare January 9, 2023 20:48

mattdangerw approved these changes Jan 10, 2023

View reviewed changes

chenmoneygithub added 8 commits January 9, 2023 17:20

initial commit

26fd509

Add keras_nlp.samplers

7e4c651

Change padding to left to right

28bcfe1

Add serialization support, and move some args from constructor to call

9757f4d

Add string example

f7508cb

small changes

b658b61

Address comments: fix docstring, remove multicase support

76c430c

Address comments: move token_probability_fn to the second place

bb430dd

chenmoneygithub force-pushed the text-generation-class branch from caa6754 to bb430dd Compare January 10, 2023 01:20

jbischof approved these changes Jan 10, 2023

View reviewed changes

chenmoneygithub merged commit 37add8b into keras-team:master Jan 10, 2023

chenmoneygithub deleted the text-generation-class branch February 2, 2023 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add keras_nlp.samplers #563

Add keras_nlp.samplers #563

chenmoneygithub commented Dec 10, 2022 •

edited

Loading

jbischof left a comment

chenmoneygithub commented Dec 12, 2022

mattdangerw commented Dec 12, 2022

mattdangerw commented Jan 4, 2023

mattdangerw Jan 4, 2023

chenmoneygithub Jan 5, 2023

mattdangerw Jan 7, 2023 •

edited

Loading

jbischof Jan 4, 2023

chenmoneygithub Jan 5, 2023

jbischof Jan 5, 2023

chenmoneygithub Jan 5, 2023

jbischof Jan 4, 2023

chenmoneygithub Jan 5, 2023

jbischof Jan 5, 2023

chenmoneygithub Jan 5, 2023

chenmoneygithub commented Jan 4, 2023

mattdangerw commented Jan 4, 2023 •

edited

Loading

mattdangerw left a comment

mattdangerw Jan 7, 2023

chenmoneygithub Jan 9, 2023

mattdangerw Jan 7, 2023 •

edited

Loading

mattdangerw Jan 7, 2023

chenmoneygithub Jan 9, 2023 •

edited

Loading

jbischof Jan 9, 2023

jbischof Jan 9, 2023

chenmoneygithub Jan 9, 2023

mattdangerw left a comment

jbischof left a comment


		prompt = tf.fill((BATCH_SIZE, 1), START_ID)

		sampler = keras_nlp.samplers.Greedy()

Add keras_nlp.samplers #563

Add keras_nlp.samplers #563

Conversation

chenmoneygithub commented Dec 10, 2022 • edited Loading

jbischof left a comment

Choose a reason for hiding this comment

chenmoneygithub commented Dec 12, 2022

mattdangerw commented Dec 12, 2022

mattdangerw commented Jan 4, 2023

1. Usage

2. Naming

3. Serialization

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw Jan 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenmoneygithub commented Jan 4, 2023

mattdangerw commented Jan 4, 2023 • edited Loading

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw Jan 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenmoneygithub Jan 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

jbischof left a comment

Choose a reason for hiding this comment

chenmoneygithub commented Dec 10, 2022 •

edited

Loading

mattdangerw Jan 7, 2023 •

edited

Loading

mattdangerw commented Jan 4, 2023 •

edited

Loading

mattdangerw Jan 7, 2023 •

edited

Loading

chenmoneygithub Jan 9, 2023 •

edited

Loading