-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FNet Preprocessor #646
Conversation
super().__init__(proto=proto, **kwargs) | ||
|
||
# Check for necessary special tokens. | ||
cls_token = "[CLS]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd, they really mix special tokens styles like this?
I guess this is half BERT style, half sentencepiece defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I verified it with HF's tokenizer output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one nit!
- Pack the inputs together using a `keras_nlp.layers.MultiSegmentPacker`. | ||
with the appropriate `"[CLS]"`, `"[SEP]"` and `"<pad>"` tokens. | ||
- Construct a dictionary with keys `"token_ids"`, `"segment_ids"` and | ||
`"padding_mask"`, that can be passed directly to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no "padding_mask" right?
https://colab.research.google.com/drive/1wd7ApfnTwS_xqa62Isdx3CGIZ95ixwxC?usp=sharing