You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS platform and distribution (e.g., Linux Ubuntu 16.04): Ubuntu
GCC/Compiler version (if compiled from source):
Describe the current behavior
In MindSpore, layers defined in the init method automatically register their parameters, even if those layers are not invoked in the construct method. This results in the parameters of these unused layers receiving gradients and being included in the optimization process, leading to potential performance issues and unintended behavior.
Parameters of unused layers consume additional computation resources by participating in gradient calculations, even when these layers do not contribute to the model's forward pass.
Unused parameters occupy memory throughout the training process. For complex models with multiple unused branches or conditional logic, this can significantly impact memory usage and training efficiency.
Describe the expected behavior
Lazy Parameter Registration:
Modify MindSpore’s behavior to register parameters only when layers are used within construct. This would align MindSpore’s behavior with other popular frameworks like TensorFlow and PyTorch, enhancing both performance and usability.
Warning for Unused Layers:
Introduce runtime warnings for layers that are defined but not invoked during the forward pass. This would help developers identify potential design issues early on.
Hello, you are correct. In MindSpore, by default, some layers defined will still receive gradient updates even if they do not participate in forward propagation, because the parameters of each layer have requires_grad set to True by default. If you want to prevent gradient updates for layers that do not participate in forward propagation, you can try using the nn.Cell.set_grad(requires_grad=False) method to see if it achieves your expected result.
Thanks for your suggestions about using Lazy Parameter Registration and Warning for Unused Layers during the Forward Pass. These are two very useful suggestions.
Thanks for your suggestions about using Lazy Parameter Registration and Warning for Unused Layers during the Forward Pass. These are two very useful suggestions.
Thank you for your reply! Looking forward to a better Mindspore~
Environment
Hardware Environment(
Ascend
/GPU
/CPU
): GPUSoftware Environment:
Describe the current behavior
In MindSpore, layers defined in the init method automatically register their parameters, even if those layers are not invoked in the construct method. This results in the parameters of these unused layers receiving gradients and being included in the optimization process, leading to potential performance issues and unintended behavior.
Parameters of unused layers consume additional computation resources by participating in gradient calculations, even when these layers do not contribute to the model's forward pass.
Unused parameters occupy memory throughout the training process. For complex models with multiple unused branches or conditional logic, this can significantly impact memory usage and training efficiency.
Describe the expected behavior
Modify MindSpore’s behavior to register parameters only when layers are used within construct. This would align MindSpore’s behavior with other popular frameworks like TensorFlow and PyTorch, enhancing both performance and usability.
Introduce runtime warnings for layers that are defined but not invoked during the forward pass. This would help developers identify potential design issues early on.
Steps to reproduce the issue
Related log / screenshot
Special notes for this issue
The text was updated successfully, but these errors were encountered: