-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add layerwise learning rate for adamw #35569
Conversation
Thanks for your contribution! |
@@ -236,6 +236,10 @@ class AdamWOpMaker : public AdamOpMaker { | |||
public: | |||
void Make() { | |||
AdamOpMaker::Make(); | |||
AddAttr<float>("lr_ratio", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add this argument to adam? is that adamw and adam share the same .cc file ?
in this case, adamw should have its own .cc file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AdamWOpMaker inherits AdamOpMaker, and they use the same InferShape function of AdamOp.
In this case, 'lr_ratio' has no effect on Adam.
@@ -163,6 +165,9 @@ def __init__(self, | |||
self._apply_decay_param_fun = apply_decay_param_fun | |||
self._coeff = coeff | |||
self._lr_to_coeff = dict() | |||
if lr_ratio is not None: | |||
assert isinstance(lr_ratio, Callable) | |||
self._lr_ratio = lr_ratio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should think about how many kernel will be affected by "lr_ratio".
if you only want the lr_ratio the affect gpu and cpu kernel, you should raise an Unimplement Error for xpu and npu here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -163,6 +165,9 @@ def __init__(self, | |||
self._apply_decay_param_fun = apply_decay_param_fun | |||
self._coeff = coeff | |||
self._lr_to_coeff = dict() | |||
if lr_ratio is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should add explanation for the new lr_ration argument, which should follow the explanation for "apply_decay_param_fun"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
33a7e31
to
a18e0d5
Compare
a18e0d5
to
442db6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* add layerwise learning rate for adamw * fix format * add unitest * add NotImplementedError * add gpu unitest * update gpuplace
PR types
New features
PR changes
OPs
Describe
add layerwise learningrate feature for adamw