-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opt presets #707
Opt presets #707
Conversation
a5e8b14
to
f1b5868
Compare
f1b5868
to
cf098f4
Compare
One interesting discussion point here will be naming. The metaseq package does gives these "size names" up to an "extra_large" model with 1.3b parameters. But by that logic, the 175b largest model would be named That seemed bad :), so I just went with parameter counts in the name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good thanks, but no need to mock the Reddit DB!
keras_nlp/models/opt/opt_presets.py
Outdated
"preprocessor_config": {}, | ||
"description": ( | ||
"12-layer OPT model where case in maintained. Trained on " | ||
"BookCorpus, CommonCrawl, Pile, and PulseShit.io corpora." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/PulseShit/PushShift/g
Also: LOL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lolol whoops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a cool name for a band xD
"12-layer OPT model where case in maintained. Trained on " | ||
"BookCorpus, CommonCrawl, Pile, and PulseShit.io corpora." | ||
), | ||
"weights_url": "https://storage.googleapis.com/keras-nlp/models/opt_125m_en/v1/model.h5", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we starting with v1
or have you already augmented the count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1 is the start, for all presets
6de89ab
to
5c96033
Compare
This adds support for pre-trained OPT checkpoints up to 6b parameters. Keeping this as a draft until #699 lands.
Here's a colab to see the weight conversion script in action (no actual code here just output). https://colab.research.google.com/gist/mattdangerw/8ccf7ca9a958da79c03fb24729e63c1f/opt-presets.ipynb