-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrected the epsilon value #665
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soma2000-lang found a bug. If you run pytest keras_nlp/models/bert
you can check your work locally. See our contribution guide for more details (link).
@soma2000-lang, could you also do it for the other backbone models? |
@abheesht17 do we know that |
@jbischof , I believe for all our layers, we've kept defaults the same as what they are in the underlying layers. For example, the kernel initialiser is "glorot_uniform" even though most models use truncated normal or random normal. Hence, I think we should probably not change the defaults in our layers. Most models (BERT, ALBERT) have 1e-12. DeBERTa has 1e-7. GPT-2 uses 1e-5. There is no consensus as such. Anyway, epsilon is the value in the denominator while normalising and is used to prevent divide by zero errors...so, it probably does not have a huge impact? Dunno. |
Makes sense! I guess there's a significant amount of research one has to do for this PR then? |
Sure @abheesht17 |
@abheesht17 @jbischof I did run black for this particular file for proper code formatting.Is it okay or should I undo the changes done for black |
Not really. The epsilon value to be passed to the Transformer Encoder layer is the same as the value passed to the LayerNorm layer post embeddings layers. I've already checked the official repos. For example, DeBERTav3: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/deberta_v3/deberta_v3_backbone.py#L131. |
@soma2000-lang, it seems like you are using different black config. This is the command you have to run:
This will work only if you are using Linux. If you are using Windows, run these commands:
|
@@ -163,6 +163,7 @@ def __init__( | |||
x, approximate=True | |||
), | |||
dropout=dropout, | |||
layer_norm_epilson=1e-12, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, @soma2000-lang. The tests are failing because epsilon
has been misspelt. Could you please correct and push again? Thanks!
@abheesht17 Done! |
@abheesht17 are we happy as is or do we want to add other models to this PR? |
I'm happy to keep this small, keep the PRs flowing, but make sure we keep tracking follow up work. @abheesht17 maybe you could edit the description on the initial issue to include changes that would need to be made to other models? Or we could open a separate tracking issue. No strong preference. |
Thank you! |
#642
The correct epsilon value is 1e-12 for BERT. We pass this correct value to the LayerNorm layer after embeddings (https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/bert/bert_backbone.py#L149) but do not pass the correct value to the TransformerEncoder layers (https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/bert/bert_backbone.py#L159-L168). The default value for layer_norm_epsilon in TransformerEncoder is 1e-5.
@abheesht17