Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.0API] Reconstruct all API related to LR Scheduler, unify dygraph and static #26550

Merged

Conversation

zhwesky2010
Copy link
Contributor

@zhwesky2010 zhwesky2010 commented Aug 21, 2020

PR types

New features

PR changes

APIs

Describe

Reconstruct all API related to lr scheduler, A total of 12 kinds of class _LRScheduler:

  1. Unify dygraph to manual update learning rate by .step() function. User should update learning rate manually by step() .

  2. Unify static with dygraph. User should update learning rate manually by step() after executor.run() , every executor.run() will feed the python float value of lr_scheduler into global learning_rate variable.


中文文档

PaddlePaddle/docs#2459


英文文档

image
image
image
image

image
image
image
image
image
image
image

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

"""
self.keys = ['last_epoch', 'last_lr']

def set_dict(self, state_dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上次有说过建立一个别名,set_state_dict

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.



Args:
d$_{model}$(int): The dimensionality of input and output feature vector of model. It is a python float number.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么要写吃d$_{model} 这种了?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为了文档上model是下标的形式

d$_{model}$(int): The dimensionality of input and output feature vector of model. It is a python float number.
warmup_steps(Variable|int): The number of warmup steps. A super parameter. It is a python float number
learning_rate (float): The initial learning rate. It is a python float number. Default: 1.0.
last_epoch (int, optional): If ``True``, prints a message to stdout for each update. Default: -1, means initial learning rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看pytorch的实现方式last_epoch,是指如果想重启训练时,可以设置重启训练的epoch数然后来计算学习率,而等于-1时,默认的学习率就是初始学习率

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Args:
d$_{model}$(int): The dimensionality of input and output feature vector of model. It is a python float number.
warmup_steps(Variable|int): The number of warmup steps. A super parameter. It is a python float number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable->Tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

last_epoch=last_epoch, verbose=verbose)

def get_lr(self):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以把这行去掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

learning_rate (float): The initial learning rate. It is a python float number.
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * decay_rate`` .
It should be less than 1.0. Default: 0.1.
last_epoch (int, optional): If ``True``, prints a message to stdout for each update. Default: -1, means initial learning rate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

lr_var = self._global_learning_rate()
# only create global lr_var once
if not isinstance(lr_var, framework.Variable):
print("create global learning rate")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行日志去掉吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

persistable=True,
stop_gradient=True,
dtype='float32' if self._dtype is None else self._dtype)
main_prog = framework.default_main_program()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么是main_program, 如果不是main_program会不会有问题?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果optimizer op在哪个program就要设在放对应program里,被设置了这个属性的program会在每次executor run时,会feed相应float型学习率到对应Variable里->前向->反向->优化,跟着optimize op走的

@zhwesky2010 zhwesky2010 force-pushed the reconstruct_lr_scheduler1 branch from d5ab480 to 818692d Compare August 22, 2020 12:37
@zhwesky2010 zhwesky2010 force-pushed the reconstruct_lr_scheduler1 branch from 818692d to 6cb899b Compare August 22, 2020 12:50
@zhwesky2010 zhwesky2010 changed the title Reconstruct all API related to lr scheduler, unify dygraph and static [2.0API] Reconstruct all API related to lr scheduler, unify dygraph and static Aug 22, 2020
@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Aug 22, 2020
@PaddlePaddle PaddlePaddle unlocked this conversation Aug 22, 2020
def step(self, epoch=None):
"""
step should be called after 'minimize' . It will Update the learning rate in optimizer according to 'epoch'.
The new learning rate will take effect on next optimize operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update->update

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimize -> step 后续优化器也是调用step函数

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

learning_rate = 0.1

Args:
learning_rate (float): The initial learning rate. It is a python float number.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

learning_rate 好像不在初始化参数列表中

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

decay_steps(int): The decay step size. It determines the decay cycle.
end_lr(float, optional): The minimum final learning rate. Default: 0.0001.
power(float, optional): Power of polynomial. Default: 1.0.
cycle(bool, optional): If set true, decay the learning rate every decay_steps. Default: False.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cycle 这个解释有问题,可以看一下PolynomialDecay的解释


class LinearLrWarmup(_LRScheduler):
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里缺少一些该学习率的介绍,之前的API是有解释的

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zhwesky2010 zhwesky2010 changed the title [2.0API] Reconstruct all API related to lr scheduler, unify dygraph and static [2.0API] Reconstruct all API related to LR Scheduler, unify dygraph and static Aug 23, 2020
Copy link
Contributor

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

paddle.disable_static()
x = np.random.uniform(-1, 1, [10, 10]).astype("float32")
linear = paddle.nn.Linear(10, 10)
scheduler = paddle.optimizer.NoamLR(d_model=0.01, warmup_steps=100, verbose=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

区分下Optimizer
paddle.optimizer.lr_scheduler.NoamLR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

out = linear(x)
loss = paddle.reduce_mean(out)
out.backward()
sgd.minimize(loss)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

原来的写法还可以用,动态图下推荐用新的写法:
sgd.step()
sgd.clear_grad()
静态图下的minimize和动态图下的minimize虽然函数名相同,但两者区别较大:

  1. 静态图minimize只被调用一次,动态图会被反复调用
  2. 静态图需要传入loss参数,动态图不需要
    所以动态图下新增了一个step函数

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

目前sgd大部分optimizer还不支持step

x = paddle.to_tensor(x)
out = linear(x)
loss = paddle.reduce_mean(out)
out.backward()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loss.backward()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

x = np.random.uniform(-1, 1, [10, 10]).astype("float32")
linear = paddle.nn.Linear(10, 10)
scheduler = paddle.optimizer.NoamLR(d_model=0.01, warmup_steps=100, verbose=True)
sgd = paddle.optimizer.SGD(learning_rate=scheduler, parameter_list=linear.parameters())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimizer使用新的参数名称
parameter_list -> parameters
#26288

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下个PR统一修改文档

main_prog = paddle.static.Program()
start_prog = paddle.static.Program()
with paddle.static.program_guard(main_prog, start_prog):
x = paddle.static.data(name='x', shape=[-1, 4, 5])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape=[None, 4, 5]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

scheduler = paddle.optimizer.NoamLR(d_model=0.01, warmup_steps=100, verbose=True)
sgd = paddle.optimizer.SGD(learning_rate=scheduler)
sgd.minimize(loss)
lr_var = sgd._global_learning_rate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么需要调用一个内部的函数?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,删去

'x': np.random.randn(3, 4, 5).astype('float32'),
'y': np.random.randn(3, 4, 5).astype('float32')
},
fetch_list=lr_var.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么需要fetch lr_var? 并没有看到有使用返回的out。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,删去

self._parameter_list = list(
parameter_list) if parameter_list is not None else None
self._name = name
if framework.in_dygraph_mode():
if not isinstance(learning_rate, float) and \
not isinstance(learning_rate, LearningRateDecay):
if not isinstance(learning_rate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么修改的是paddle.fluid.optimizer.py文件,而不是paddle.optimizer.optimizer.py文件?
1.8版本写的代码,运行的行为会发生变化。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新optimizer目前不支持大部分优化器,通知迁移优化器同学将fluid 中optimizer行为迁移到paddle optimizer中。

是做的兼容升级,1.8中不会有行为变化,但支持新的逻辑。

@zhwesky2010
Copy link
Contributor Author

文档修改在下个PR统一修复

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

先合入,下个PR更新示例代码。

Copy link
Contributor

@jzhang533 jzhang533 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm
will have followup pr.

@zhwesky2010 zhwesky2010 merged commit 407de03 into PaddlePaddle:develop Aug 24, 2020

Args:
learning_rate (float): The initial learning rate. It is a python float number.
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

看init 是必选参数吧?

gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` .
It should be less than 1.0. Default: 0.1.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


Args:
learning_rate (float): The initial learning rate. It is a python float number.
gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gamma 是否为 optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的

gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` .
It should be less than 1.0. Default: 0.1.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上 缺少optional

learning_rate (float): The initial learning rate. It is a python float number.
lr_lambda (function): A function which computes a factor by ``epoch`` , and then multiply the initial learning rate by this factor.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上 缺少optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ALL Done

warmup_steps(int): The number of warmup steps. A super parameter. It is a python int number
learning_rate (float): The initial learning rate. It is a python float number. Default: 1.0.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同 optional

values(list): A list of learning rate values that will be picked during different epoch boundaries.
The type of element in the list is python float.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同 optional

cycle(bool, optional): Whether the learning rate rises again. If True, then the learning rate will rise when it decrease
to ``end_lr`` . If False, the learning rate is monotone decreasing. Default: False.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同 optional

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aLL DONE

change of ``loss`` is ``threshold`` . Default: ``'rel'`` .
cooldown (int, optional): The number of epochs to wait before resuming normal operation. Default: 0.
min_lr (float, optional): The lower bound of the learning rate after reduction. Default: 0.
epsilon (float, optional): Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smaller than epsilon

gamma (float, optional): The Ratio that the learning rate will be reduced. ``new_lr = origin_lr * gamma`` .
It should be less than 1.0. Default: 0.1.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool): If ``True``, prints a message to stdout for each update. Default: ``False`` .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上 optional

@jzhang533
Copy link
Contributor

lr scheduler都有一个verbose参数,感觉并不是很必要吧?

@zhwesky2010
Copy link
Contributor Author

lr scheduler都有一个verbose参数,感觉并不是很必要吧?

这个功能感觉还比较实用

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants