Corrected the epsilon value #665

soma2000-lang · 2023-01-16T01:00:37Z

#642
The correct epsilon value is 1e-12 for BERT. We pass this correct value to the LayerNorm layer after embeddings (https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/bert/bert_backbone.py#L149) but do not pass the correct value to the TransformerEncoder layers (https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/bert/bert_backbone.py#L159-L168). The default value for layer_norm_epsilon in TransformerEncoder is 1e-5.

@abheesht17

jbischof

@soma2000-lang found a bug. If you run pytest keras_nlp/models/bert you can check your work locally. See our contribution guide for more details (link).

keras_nlp/models/bert/bert_backbone.py

abheesht17 · 2023-01-16T03:35:33Z

@soma2000-lang, could you also do it for the other backbone models?

jbischof · 2023-01-16T04:11:49Z

@abheesht17 do we know that 1e-12 is a better default value and used in those papers? If so should we change in TransformerEncoder and TransformerDecoder so that we have good defaults for everyone?

abheesht17 · 2023-01-16T04:18:44Z

@jbischof , I believe for all our layers, we've kept defaults the same as what they are in the underlying layers. For example, the kernel initialiser is "glorot_uniform" even though most models use truncated normal or random normal. Hence, I think we should probably not change the defaults in our layers.

Most models (BERT, ALBERT) have 1e-12. DeBERTa has 1e-7. GPT-2 uses 1e-5. There is no consensus as such. Anyway, epsilon is the value in the denominator while normalising and is used to prevent divide by zero errors...so, it probably does not have a huge impact? Dunno.

jbischof · 2023-01-16T04:30:32Z

Makes sense! I guess there's a significant amount of research one has to do for this PR then?

soma2000-lang · 2023-01-16T04:32:54Z

@soma2000-lang, could you also do it for the other backbone models?

Sure @abheesht17

soma2000-lang · 2023-01-16T04:47:00Z

@abheesht17 @jbischof I did run black for this particular file for proper code formatting.Is it okay or should I undo the changes done for black

abheesht17 · 2023-01-16T06:35:32Z

Makes sense! I guess there's a significant amount of research one has to do for this PR then?

Not really. The epsilon value to be passed to the Transformer Encoder layer is the same as the value passed to the LayerNorm layer post embeddings layers. I've already checked the official repos.

For example, DeBERTav3: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/deberta_v3/deberta_v3_backbone.py#L131.

abheesht17 · 2023-01-16T06:39:05Z

@abheesht17 @jbischof I did run black for this particular file for proper code formatting.Is it okay or should I undo the changes done for black

@soma2000-lang, it seems like you are using different black config. This is the command you have to run:

bash shell/format.sh

This will work only if you are using Linux. If you are using Windows, run these commands:

isort --sp "setup.cfg" --sl <name-of-file>
black --line-length 80 <name-of-file>
flake8 --config "setup.cfg" --max-line-length=200 <name-of-file>

abheesht17 · 2023-01-16T08:03:38Z

keras_nlp/models/bert/bert_backbone.py

@@ -163,6 +163,7 @@ def __init__(
                    x, approximate=True
                ),
                dropout=dropout,
+                layer_norm_epilson=1e-12,


Hey, @soma2000-lang. The tests are failing because epsilon has been misspelt. Could you please correct and push again? Thanks!

soma2000-lang · 2023-01-16T13:50:02Z

@abheesht17 Done!

jbischof · 2023-01-17T19:15:09Z

@abheesht17 are we happy as is or do we want to add other models to this PR?

mattdangerw · 2023-01-17T19:51:25Z

I'm happy to keep this small, keep the PRs flowing, but make sure we keep tracking follow up work.

@abheesht17 maybe you could edit the description on the initial issue to include changes that would need to be made to other models? Or we could open a separate tracking issue. No strong preference.

mattdangerw · 2023-01-17T20:52:36Z

Thank you!

soma2000-lang added 2 commits January 16, 2023 06:17

Corrected the epsilon value

0f0b7d0

Corrected the epsilon value

50441fc

jbischof suggested changes Jan 16, 2023

View reviewed changes

keras_nlp/models/bert/bert_backbone.py Outdated Show resolved Hide resolved

soma2000-lang added 2 commits January 16, 2023 10:04

Corrected the epsilon value

1cf99ee

Corrected the epsilon value

2754375

Corrected the epsilon value

9255050

abheesht17 reviewed Jan 16, 2023

View reviewed changes

Update bert_backbone.py

f9318b5

mattdangerw merged commit 99edccf into keras-team:master Jan 17, 2023

soma2000-lang deleted the Epsilon branch February 7, 2023 15:55

abheesht17 mentioned this pull request Feb 9, 2023

Rob #727

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrected the epsilon value #665

Corrected the epsilon value #665

soma2000-lang commented Jan 16, 2023 •

edited

Loading

jbischof left a comment

abheesht17 commented Jan 16, 2023

jbischof commented Jan 16, 2023

abheesht17 commented Jan 16, 2023

jbischof commented Jan 16, 2023

soma2000-lang commented Jan 16, 2023

soma2000-lang commented Jan 16, 2023

abheesht17 commented Jan 16, 2023

abheesht17 commented Jan 16, 2023

abheesht17 Jan 16, 2023 •

edited

Loading

soma2000-lang commented Jan 16, 2023

jbischof commented Jan 17, 2023

mattdangerw commented Jan 17, 2023

mattdangerw commented Jan 17, 2023

Corrected the epsilon value #665

Corrected the epsilon value #665

Conversation

soma2000-lang commented Jan 16, 2023 • edited Loading

jbischof left a comment

Choose a reason for hiding this comment

abheesht17 commented Jan 16, 2023

jbischof commented Jan 16, 2023

abheesht17 commented Jan 16, 2023

jbischof commented Jan 16, 2023

soma2000-lang commented Jan 16, 2023

soma2000-lang commented Jan 16, 2023

abheesht17 commented Jan 16, 2023

abheesht17 commented Jan 16, 2023

abheesht17 Jan 16, 2023 • edited Loading

Choose a reason for hiding this comment

soma2000-lang commented Jan 16, 2023

jbischof commented Jan 17, 2023

mattdangerw commented Jan 17, 2023

mattdangerw commented Jan 17, 2023

soma2000-lang commented Jan 16, 2023 •

edited

Loading

abheesht17 Jan 16, 2023 •

edited

Loading