-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add model Rembert #1701
Add model Rembert #1701
Conversation
感谢贡献,麻烦根据comment修改下哈😊 @Beacontownfc |
已根据您的要求进行了修改 @gongel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@yingyibiao 一切OK,请求批准合入😊 |
Batch input with padding may have a problem.
@Beacontownfc padding输入有问题,麻烦check一下哈 import io
import os
import shutil
import importlib
import numpy as np
import paddle
import torch
import transformers as hfnlp
import paddlenlp
from paddlenlp.data import Pad
import paddlenlp.transformers as ppnlp
os.environ["TRANSFORMERS_CACHE"] = "./hf/"
os.environ["PPNLP_HOME"] = "./pdnlp/"
def compute_diff(torch_data, paddle_data):
torch_data = torch_data.detach().numpy()
paddle_data = paddle_data.numpy()
out_dict = dict()
diff = np.abs(torch_data - paddle_data)
out_dict = "max: {} mean: {} min: {}".format(diff.max(), diff.mean(), diff.min())
return out_dict
def compare_base(model_id):
sentences = [
"This is an example sentence.",
"Each sentence is converted .",
"欢迎使用 PaddlePaddle 。",
"欢迎使用 PaddleNLP 。"
]
# Calculate HF output
hf_tokenizer = hfnlp.RemBertTokenizer.from_pretrained('google/rembert') # google/rembert
hf_model = hfnlp.RemBertModel.from_pretrained('google/rembert') # google/rembert
hf_model.eval()
with torch.no_grad():
hf_inputs = hf_tokenizer(sentences, padding=True, return_tensors="pt")
print(hf_inputs)
hf_out = hf_model(**hf_inputs).last_hidden_state
# Calculate Paddle output
pd_tokenizer = ppnlp.RemBertTokenizer.from_pretrained('rembert')
pd_model = ppnlp.RemBertModel.from_pretrained('rembert')
pd_model.eval()
with paddle.no_grad():
pd_inputs = pd_tokenizer(sentences)
input_ids = paddle.to_tensor(Pad(axis=0, pad_val=pd_tokenizer.pad_token_id)([pd_input for pd_input in pd_inputs["input_ids"]]))
token_type_ids = paddle.to_tensor(Pad(axis=0, pad_val=pd_tokenizer.pad_token_type_id)([pd_input for pd_input in pd_inputs["token_type_ids"]]))
print(input_ids)
print(token_type_ids)
pd_out = pd_model(input_ids, token_type_ids)[0]
return compute_diff(hf_out, pd_out)
print(compare_base("rembert")) |
您好,这行代码 |
请问你的paddlenlp是哪个版本呢?develop代码目前没问题 |
* add rembert * add rembert * Update tokenizer.py * update rembert * modify * modify according to gongel * Update tokenizer.py * Update tokenizer.py * Update modeling.py * fix bug Co-authored-by: gongenlei <[email protected]>
@Beacontownfc 可以再适配一下Auto哈 |
Description
Add new model RemBert
The model weight:
链接:https://aistudio.baidu.com/aistudio/datasetdetail/129105