Add model Rembert #1701

Beacontownfc · 2022-02-22T14:00:53Z

Description
Add new model RemBert
The model weight:
链接：https://aistudio.baidu.com/aistudio/datasetdetail/129105

paddlenlp/transformers/rembert/tokenizer.py

paddlenlp/transformers/rembert/modeling.py

gongel · 2022-04-08T11:41:05Z

感谢贡献，麻烦根据comment修改下哈😊 @Beacontownfc

Beacontownfc · 2022-04-09T01:28:08Z

已根据您的要求进行了修改 @gongel

gongel

LGTM

Beacontownfc · 2022-04-12T11:28:40Z

@yingyibiao 一切OK，请求批准合入😊

Batch input with padding may have a problem.

gongel · 2022-04-12T13:08:29Z

@Beacontownfc padding输入有问题，麻烦check一下哈

import io
import os
import shutil
import importlib

import numpy as np
import paddle
import torch
import transformers as hfnlp
import paddlenlp
from paddlenlp.data import Pad
import paddlenlp.transformers as ppnlp

os.environ["TRANSFORMERS_CACHE"] = "./hf/"
os.environ["PPNLP_HOME"] = "./pdnlp/"


def compute_diff(torch_data, paddle_data):
	torch_data = torch_data.detach().numpy()
	paddle_data = paddle_data.numpy()
	out_dict = dict()
	diff = np.abs(torch_data - paddle_data)
	out_dict = "max: {}    mean: {}    min: {}".format(diff.max(), diff.mean(), diff.min())
	return out_dict


def compare_base(model_id):
	sentences = [
		"This is an example sentence.", 
		"Each sentence is converted .", 
		"欢迎使用 PaddlePaddle  。",
		"欢迎使用 PaddleNLP 。"
	]
	
	# Calculate HF output
	hf_tokenizer = hfnlp.RemBertTokenizer.from_pretrained('google/rembert') # google/rembert
	hf_model = hfnlp.RemBertModel.from_pretrained('google/rembert') # google/rembert
	hf_model.eval()
	with torch.no_grad():
		hf_inputs = hf_tokenizer(sentences, padding=True, return_tensors="pt")
		print(hf_inputs)
		hf_out = hf_model(**hf_inputs).last_hidden_state
	
	# Calculate Paddle output
	pd_tokenizer = ppnlp.RemBertTokenizer.from_pretrained('rembert')
	pd_model = ppnlp.RemBertModel.from_pretrained('rembert')
	pd_model.eval()
	with paddle.no_grad():
		pd_inputs = pd_tokenizer(sentences)
		input_ids = paddle.to_tensor(Pad(axis=0, pad_val=pd_tokenizer.pad_token_id)([pd_input for pd_input in pd_inputs["input_ids"]]))
		token_type_ids = paddle.to_tensor(Pad(axis=0, pad_val=pd_tokenizer.pad_token_type_id)([pd_input for pd_input in pd_inputs["token_type_ids"]]))
		print(input_ids)
		print(token_type_ids)
		pd_out = pd_model(input_ids, token_type_ids)[0]

	return compute_diff(hf_out, pd_out)

print(compare_base("rembert"))

Beacontownfc · 2022-04-12T14:45:02Z

input_ids = paddle.to_tensor(Pad(axis=0, pad_val=pd_tokenizer.pad_token_id)([pd_input for pd_input in pd_inputs["input_ids"]]))

您好，这行代码
input_ids = paddle.to_tensor(Pad(axis=0, pad_val=pd_tokenizer.pad_token_id)([pd_input for pd_input in pd_inputs["input_ids"]]))
换成BertTokenizer运行也是出现报错，我认为此行代码应该改成这样
input_ids = paddle.to_tensor( Pad(axis=0, pad_val=pd_tokenizer.pad_token_id)([pd_input["input_ids"] for pd_input in pd_inputs]))
这样代码正常运行了

gongel · 2022-04-13T03:33:25Z

请问你的paddlenlp是哪个版本呢？develop代码目前没问题

* add rembert * add rembert * Update tokenizer.py * update rembert * modify * modify according to gongel * Update tokenizer.py * Update tokenizer.py * Update modeling.py * fix bug Co-authored-by: gongenlei <[email protected]>

gongel · 2022-06-08T02:42:51Z

@Beacontownfc 可以再适配一下Auto哈

Beacontownfc and others added 3 commits February 22, 2022 16:29

add rembert

d90d38c

add rembert

c10b040

Update tokenizer.py

a4bf3d7

yingyibiao added the contributions label Feb 25, 2022

update rembert

62ced9a

gongel self-requested a review April 8, 2022 11:25

gongel requested changes Apr 8, 2022

View reviewed changes

Beacontownfc and others added 3 commits April 9, 2022 08:04

Merge branch 'develop' into rembert

36df9c1

modify

2bb19b6

modify according to gongel

8b715a2

Beacontownfc and others added 3 commits April 9, 2022 09:31

Update tokenizer.py

4644d6c

Update tokenizer.py

364f5cd

Merge branch 'develop' into rembert

188fb15

gongel previously approved these changes Apr 12, 2022

View reviewed changes

Beacontownfc and others added 2 commits April 13, 2022 21:45

Update modeling.py

d9c8eb3

fix bug

4370832

gongel approved these changes Apr 14, 2022

View reviewed changes

gongenlei and others added 2 commits April 14, 2022 15:10

Merge branch 'develop' into rembert

47f9a23

Merge branch 'develop' into rembert

9fff306

yingyibiao merged commit 70649b1 into PaddlePaddle:develop Apr 14, 2022

guoshengCS mentioned this pull request Apr 29, 2022

PaddleNLP v2.3rc Release Note Candidate #2031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model Rembert #1701

Add model Rembert #1701

Beacontownfc commented Feb 22, 2022

gongel commented Apr 8, 2022

Beacontownfc commented Apr 9, 2022

gongel left a comment

Beacontownfc commented Apr 12, 2022

gongel commented Apr 12, 2022

Beacontownfc commented Apr 12, 2022 •

edited

Loading

gongel commented Apr 13, 2022

gongel commented Jun 8, 2022

Add model Rembert #1701

Add model Rembert #1701

Conversation

Beacontownfc commented Feb 22, 2022

gongel commented Apr 8, 2022

Beacontownfc commented Apr 9, 2022

gongel left a comment

Choose a reason for hiding this comment

Beacontownfc commented Apr 12, 2022

gongel commented Apr 12, 2022

Beacontownfc commented Apr 12, 2022 • edited Loading

gongel commented Apr 13, 2022

gongel commented Jun 8, 2022

Beacontownfc commented Apr 12, 2022 •

edited

Loading