We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我在用pytorch迁移到mindspore上来,目前发现mindspore需要自己计算出梯度,通过opt(grad)进行更新。但是在实际训练中,我发现程序占据显存无限制的增长,个人推测是梯度没有更新导致的。
模型的运算函数如下:
def construct( self, input_ids=None, attention_mask=None, token_type_ids=None, detect_labels=None, correct_labels=None ): hidden_states = self.bert( input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)[0] detect_outputs = self.tag_detect_projection_layer(hidden_states) correct_outputs = self.tag_label_projection_layer(hidden_states) result = { "detect_outputs": detect_outputs, "correct_outputs": correct_outputs, "detect_loss": None, "correct_loss": None, "loss": None, } loss = None if detect_labels is not None and correct_labels is not None: detect_loss = self._detect_criterion(detect_outputs.view(-1, self.args.detect_vocab_size), detect_labels.view(-1)) correct_loss = self._correct_criterion( correct_outputs.view(-1, self.args.correct_vocab_size), correct_labels.view(-1)) loss = detect_loss + correct_loss result["detect_loss"] = detect_loss result["correct_loss"] = correct_loss elif detect_labels is not None: loss = self._detect_criterion( detect_outputs.view(-1, self.args.detect_vocab_size), detect_labels.view(-1)) elif correct_labels is not None: loss = self._correct_criterion( correct_outputs.view(-1, self.args.correct_vocab_size), correct_labels.view(-1)) result["loss"] = loss #output=result return result
训练时定义的函数如下:
------------------------------------ def foward_fn(self,batch_data): detect_labels = batch_data[3] correct_labels = batch_data[4] output = self.model(batch_data[0], batch_data[1], batch_data[2], detect_labels, correct_labels) return output -------------------------------------------- self.optimizer=AdamW(self.model.trainable_params(), lr=args.learning_rate, eps=args.adam_epsilon) grad_fn=mindspore.value_and_grad(self.foward_fn,None,self.optimizer.parameters,has_aux=False) for epoch in range(1, self.epochs + 1): for step, batch_data in enumerate(self.train_loader): output,grad=grad_fn(batch_data) loss=output['loss'].mean() mindspore.ops.clip_by_norm(x=grad, max_norm=self.args.max_grad_norm) self.optimizer(grad)
请问一下是不是我的代码存在问题呢?
The text was updated successfully, but these errors were encountered:
No branches or pull requests
我在用pytorch迁移到mindspore上来,目前发现mindspore需要自己计算出梯度,通过opt(grad)进行更新。但是在实际训练中,我发现程序占据显存无限制的增长,个人推测是梯度没有更新导致的。
模型的运算函数如下:
训练时定义的函数如下:
请问一下是不是我的代码存在问题呢?
The text was updated successfully, but these errors were encountered: