Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to used on bert #1

Open
WBWhiteBeard opened this issue Oct 2, 2021 · 1 comment
Open

how to used on bert #1

WBWhiteBeard opened this issue Oct 2, 2021 · 1 comment

Comments

@WBWhiteBeard
Copy link

作者,你好。
我想使用你的模型在bert上,我看到你的代码中model文件中151行这里:

def forward(self, x, hidden):
       bptt_len, bs = x.shape
       vocab_sz = self.embedding.num_embeddings

这里的 x 的输入只有 seq_len, bs,而我的bert输出是 bs * seq_len * hidden_size,这里我是需要降维么

@WBWhiteBeard WBWhiteBeard changed the title how to used in bert how to used on bert Oct 2, 2021
@Shiweiliuiiiiiii
Copy link
Owner

Hi,

The model.py in the github is for RHN and LSTM-based models. I believe you don't need this model to train your Bert.
Apply Selfish-RNN to train other models is simple, you just need to create a sets of masks with the "masking" function as below:

decay = CosineDecay(args.death_rate, args.epochs * len(train_data) // args.bptt)
mask = Masking(optimizer, death_rate=args.death_rate, death_mode=args.death, death_rate_decay=decay, growth_mode=args.growth, redistribution_mode=args.redistribution, model=args.model)
mask.add_module(model, sparse_init=args.sparse_init, density=args.density)

Then, the model can be trained with regular optimizers or SNT-ASGD. Note that you need to change the optimizer.step() to mask.step() in the training function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants