-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BertMaskedLM Task Model and Preprocessor #774
Conversation
@mattdangerw This PR is ready for review. |
@Cyber-Machine use |
…nto BertMaskedLM
@mattdangerw @abheesht17 This PR is ready for review. |
… into BertMaskedLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This looks great to me. Just some minor comments.
@keras.utils.register_keras_serializable(package="keras_nlp") | ||
class BertMaskedLMPreprocessor(BertPreprocessor): | ||
"""BERT preprocessing for the masked language modeling task. | ||
This preprocessing layer will prepare inputs for a masked language modeling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the empty newlines from the version you copied from got removed. (github does this for some reason)
Can you add them back in throughout this docstring?
self.assertAllEqual( | ||
x["padding_mask"], | ||
[ | ||
True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, shorted these examples so we don't take up so much vertical space here.
You should be able to pass 0s and 1s here which should help, we do that for other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of NITs
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
"""BERT masked lm model.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"lm" --> "LM"
intermediate_dim=3072, | ||
max_sequence_length=12 | ||
) | ||
# Create a BERT masked_lm and fit the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Crete a BERT masked LM model and fit the data."
Thank you! This is great! |
Fixes #719
I have made the following changes, but I am still, working on the process:
BertTokenizer
to expect a mask token.BertMaskedLMPreprocessor
preprocessor layer and tests.BertMaskedLM
task model and tests.keras_nlp/models/__init__.py
to exportBertMaskedLM
andBertMaskedLMPreprocessor.