-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Qwen support #850
base: main
Are you sure you want to change the base?
Add Qwen support #850
Conversation
4823c6a
to
2fe1ec3
Compare
Thank you @chenht2026, sorry for the wait. A few of us took a break :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contributions! One comment on the config flag
@@ -27,6 +27,8 @@ class Config: | |||
rotary_percentage: float = 0.25 | |||
parallel_residual: bool = True | |||
bias: bool = True | |||
# just for Qwen | |||
is_Qwen: Optional[bool] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should avoid this in favor of something that characterizes Qwen, like having bias only in c_attn
.
For the time being we could rename this as attn_bias
, and then in the future turn bias
into a Option[bool, List[str]]
if there's a need for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a tests/test_model.py
test?
# Copyright (c) Alibaba Cloud. | ||
# | ||
# This source code is licensed under the license found in the | ||
# LICENSE file in the root directory of this source tree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you didn't write the code from this file (which I assume you were not since you did add this license), you should link it to the original source. Is the original from PaddlePaddle? https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/qwen/tokenizer.py
I would advice that you create a version that only implements that few methods required by tokenizer.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just copy it from their huggingface repo. tokenization_qwen.py
from typing import Collection, Dict, List, Set, Tuple, Union | ||
|
||
import tiktoken | ||
from transformers import PreTrainedTokenizer, AddedToken |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This project doesn't have transformers as a dependency, so this import is not possible
@@ -91,6 +95,8 @@ def encode( | |||
tokens = self.processor.encode(string).ids | |||
elif self.backend == "sentencepiece": | |||
tokens = self.processor.encode(string) | |||
elif self.backend == "tiktoken": | |||
tokens = self.processor.encode(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem like the new processor implements this method. Also, what about decoding?
tutorials/download_Qwen.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good mention qwen's recommended languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chinese and English. I'll add it.
Would it work with Qwen 2 (Qwen/Qwen1.5-7B-Chat)? if not what needs to be added? |
Hey just pinging to see if you are still interested in pursuing this PR. Personally, I think it'd be awesome to support the Qwen models (1.5 and especially 2) in LitGPT. There have been some improvements in the tokenizer in LitGPT recently that could now make this more easily possible. Btw if rebasing here based on the main branch (which changed a lot) is too messy, you could also just open a fresh PR. |
It works.
Qwen's tokenizer is based on tiktoken, I add the tokenizer(tokenization_qwen.py) from its huggingface repo without any revision. This make the code a little complicate, so maybe do not merge.
May be some one needs it.
Closes #840