Light-weight benchmarking script #664

NusretOzates · 2023-01-15T22:11:11Z

Implemented sentiment analysis benchmark for Classifiers using IMDB review dataset for #634

…eview dataset

google-cla · 2023-01-15T22:11:15Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

chenmoneygithub · 2023-01-17T06:17:48Z

@NusretOzates Thank you for the PR! Will take a closer look tomorrow.

mattdangerw

LGTM! Just a few comments.

keras_nlp/benchmarks/README.md

mattdangerw · 2023-01-17T19:41:34Z

keras_nlp/benchmarks/sentiment_analysis.py

+def create_model():
+    for name, symbol in keras_nlp.models.__dict__.items():
+        if inspect.isclass(symbol) and issubclass(symbol, keras.Model):
+            if FLAGS.model and name != f"{FLAGS.model.capitalize()}Classifier":


Rather than all this, I would just take in the symbol name directly e.g. --model=BertClassifier. This will be a little more obvious in usage.

Yea, this is my bad honestly, I put "bert" in the problem description.

mattdangerw · 2023-01-17T19:45:14Z

keras_nlp/benchmarks/sentiment_analysis.py

+        .prefetch(tf.data.AUTOTUNE)
+    )
+    val_dataset = (
+        test_dataset.take(10000)


Maybe rather than a hardcoded split like this, you could run tfds.load with_info. Use that to get the size of the test set, and use a fractional split here. E.g. int(test_dataset_cardinality / 2)

I think you can do:

test_ds_size = test_dataset.cardinality() val_dataset = test_dataset.take(test_ds_size // 2) ...

mattdangerw · 2023-01-17T19:46:59Z

keras_nlp/benchmarks/sentiment_analysis.py

+
+    # End time
+    end_time = time.time()
+    print(f"Total time: {end_time - start_time}")


Maybe "Wall time", so it's clear we are just measuring elapsed time?

keras_nlp/benchmarks/README.md

chenmoneygithub

Looks great overall! A few comments.

chenmoneygithub · 2023-01-17T19:53:26Z

keras_nlp/benchmarks/README.md

+from the root of the repository:
+
+```sh
+python3 .keras_nlp/benchmarks/sentiment_analysis.py \


remove the extra ".": .keras_nlp/ => keras_nlp/

chenmoneygithub · 2023-01-17T20:00:31Z

keras_nlp/benchmarks/sentiment_analysis.py

@@ -0,0 +1,136 @@
+# Copyright 2022 The KerasNLP Authors


We can use 2023 now, time flashes!

chenmoneygithub · 2023-01-17T20:01:36Z

keras_nlp/benchmarks/sentiment_analysis.py

+
+FLAGS = flags.FLAGS
+flags.DEFINE_string(
+    "model", None, "The name of the classifier such as BertClassifier."


Add a training comma so that it's formatted to multilines.

chenmoneygithub · 2023-01-17T20:05:14Z

keras_nlp/benchmarks/sentiment_analysis.py

+        .prefetch(tf.data.AUTOTUNE)
+    )
+    val_dataset = (
+        test_dataset.take(10000)


I think you can do:

test_ds_size = test_dataset.cardinality() val_dataset = test_dataset.take(test_ds_size // 2) ...

chenmoneygithub · 2023-01-17T20:05:51Z

keras_nlp/benchmarks/sentiment_analysis.py

+def create_model():
+    for name, symbol in keras_nlp.models.__dict__.items():
+        if inspect.isclass(symbol) and issubclass(symbol, keras.Model):
+            if FLAGS.model and name != f"{FLAGS.model.capitalize()}Classifier":


Yea, this is my bad honestly, I put "bert" in the problem description.

NusretOzates · 2023-01-17T20:36:52Z

Will do all of that tomorrow! Thanks for the review @chenmoneygithub @mattdangerw

…t dataset size set automatically using dataset info

NusretOzates · 2023-01-19T20:32:47Z

@chenmoneygithub @mattdangerw, I made the necessary changes. I would like to add a TensorFlow profiler too but tensorboard and tensorboard_profiler_plugin are not in the requirements and I didn't want to touch that part 😄

mattdangerw

LGTM! Just a few last comments, but after those are in I think this is all set to merge.

Agreed the tensorboard profiler bit can be a follow up.

Thanks so much for this! This is high quality code.

mattdangerw · 2023-01-19T20:44:33Z

keras_nlp/benchmarks/sentiment_analysis.py

+
+
+def check_flags():
+    if not FLAGS.model:


I just remembered there is actually a way to do this with absl directly. flags.mark_flag_as_required("flag").

https://github.com/keras-team/keras-nlp/blob/master/examples/bert_pretraining/bert_pretrain.py#L454

mattdangerw · 2023-01-19T20:48:33Z

keras_nlp/benchmarks/sentiment_analysis.py

+import keras_nlp
+
+# Use mixed precision for optimal performance
+keras.mixed_precision.set_global_policy("mixed_float16")


Actually given that this is a benchmarking script, better to leave this as a flag probably. Totally valid to benchmark a model under full precision or mixed.

Can we just make this a string flag?

flags.DEFINE_string( "mixed_precision_policy", "mixed_float16", "The global mixed precision policy to use. E.g. 'mixed_float16' or 'float32'.", )

mattdangerw · 2023-01-19T20:49:48Z

keras_nlp/benchmarks/sentiment_analysis.py

+        .prefetch(tf.data.AUTOTUNE)
+    )
+
+    test_dataset_size = info.splits['test'].num_examples // 2


Maybe drop a comment here.

# We split the test data evenly into validation and test sets.

… optional

NusretOzates · 2023-01-19T21:11:38Z

@mattdangerw All done👌 I'm happy to contribute and I actually would like/plan to do more!

mattdangerw · 2023-01-19T21:22:09Z

@NusretOzates thanks! I see one small issue, we now need to make sure to only set the mixed precision policy after flags are parsed, or the script wont run. I will push a small fix.

NusretOzates · 2023-01-19T21:29:36Z

That makes sense, thanks for the fix 😄

chenmoneygithub

The code looks great! We want to thank you for the high-quality PR again! XD

NusretOzates · 2023-01-19T21:40:47Z

You are welcome! 😄

Implemented sentiment analysis benchmark for Classifiers using IMDB r…

2975e0c

…eview dataset

chenmoneygithub self-requested a review January 15, 2023 23:40

jbischof self-requested a review January 17, 2023 15:14

mattdangerw reviewed Jan 17, 2023

View reviewed changes

chenmoneygithub suggested changes Jan 17, 2023

View reviewed changes

Script usage code updated and unnecessary spaces removed. Val and tes…

63fdf3d

…t dataset size set automatically using dataset info

mattdangerw approved these changes Jan 19, 2023

View reviewed changes

Made "model" flag required using absl and made mixed precision policy…

d6a2b7b

… optional

Fix flag parsing issue and line length over 80 chars

22a0338

chenmoneygithub approved these changes Jan 19, 2023

View reviewed changes

chenmoneygithub merged commit fb1eeb9 into keras-team:master Jan 19, 2023

NusretOzates deleted the sentinement_analysis_benchmark branch January 19, 2023 21:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Light-weight benchmarking script #664

Light-weight benchmarking script #664

NusretOzates commented Jan 15, 2023

google-cla bot commented Jan 15, 2023

chenmoneygithub commented Jan 17, 2023

mattdangerw left a comment

mattdangerw Jan 17, 2023

chenmoneygithub Jan 17, 2023

mattdangerw Jan 17, 2023

chenmoneygithub Jan 17, 2023

mattdangerw Jan 17, 2023

chenmoneygithub left a comment

chenmoneygithub Jan 17, 2023

chenmoneygithub Jan 17, 2023

chenmoneygithub Jan 17, 2023

chenmoneygithub Jan 17, 2023

chenmoneygithub Jan 17, 2023

NusretOzates commented Jan 17, 2023

NusretOzates commented Jan 19, 2023

mattdangerw left a comment

mattdangerw Jan 19, 2023

mattdangerw Jan 19, 2023

mattdangerw Jan 19, 2023

NusretOzates commented Jan 19, 2023

mattdangerw commented Jan 19, 2023

NusretOzates commented Jan 19, 2023

chenmoneygithub left a comment

NusretOzates commented Jan 19, 2023

Light-weight benchmarking script #664

Light-weight benchmarking script #664

Conversation

NusretOzates commented Jan 15, 2023

google-cla bot commented Jan 15, 2023

chenmoneygithub commented Jan 17, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenmoneygithub left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NusretOzates commented Jan 17, 2023

NusretOzates commented Jan 19, 2023

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NusretOzates commented Jan 19, 2023

mattdangerw commented Jan 19, 2023

NusretOzates commented Jan 19, 2023

chenmoneygithub left a comment

Choose a reason for hiding this comment

NusretOzates commented Jan 19, 2023