how to used on bert #1

WBWhiteBeard · 2021-10-02T04:06:32Z

作者，你好。
我想使用你的模型在bert上，我看到你的代码中model文件中151行这里：

def forward(self, x, hidden):
       bptt_len, bs = x.shape
       vocab_sz = self.embedding.num_embeddings

这里的 x 的输入只有 seq_len, bs，而我的bert输出是 bs * seq_len * hidden_size，这里我是需要降维么

The text was updated successfully, but these errors were encountered:

Shiweiliuiiiiiii · 2021-10-02T06:15:16Z

Hi,

The model.py in the github is for RHN and LSTM-based models. I believe you don't need this model to train your Bert.
Apply Selfish-RNN to train other models is simple, you just need to create a sets of masks with the "masking" function as below:

decay = CosineDecay(args.death_rate, args.epochs * len(train_data) // args.bptt)
mask = Masking(optimizer, death_rate=args.death_rate, death_mode=args.death, death_rate_decay=decay, growth_mode=args.growth, redistribution_mode=args.redistribution, model=args.model)
mask.add_module(model, sparse_init=args.sparse_init, density=args.density)

Then, the model can be trained with regular optimizers or SNT-ASGD. Note that you need to change the optimizer.step() to mask.step() in the training function.

WBWhiteBeard changed the title ~~how to used in bert~~ how to used on bert Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to used on bert #1

how to used on bert #1

WBWhiteBeard commented Oct 2, 2021

Shiweiliuiiiiiii commented Oct 2, 2021

how to used on bert #1

how to used on bert #1

Comments

WBWhiteBeard commented Oct 2, 2021

Shiweiliuiiiiiii commented Oct 2, 2021