#paddle.optimizer.lr.OneCycleLR设计文档
API名称 | paddle.optimizer.lr.OneCycleLR |
---|---|
提交作者 | Asthestarsfalll |
提交时间 | 2022-03-12 |
版本号 | V1.1 |
依赖飞桨版本 | develop |
文件名 | 20210312_api_design_for_one_cycle_lr.md |
OneCycleLR最早在Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates中提出,它在整个训练过程中只有一个周期(OneCycle),即从小学习率上升至最大学习率,然后又下降至低于初始学习率的大小,它的一个特征是每个batch都进行学习率调整。
在 Paddle 框架中,新增 OneCycleLR 优化调度器,调用路径为:paddle.optimizer.lr.OneCycleLR。
飞桨支持OneCycleLR 优化调度器
目前paddle缺少相关功能实现。
Pytorch中有APItorch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, three_phase=False, last_epoch=- 1, verbose=False)
在pytorch中,介绍为:
Sets the learning rate of each parameter group according to the 1cycle learning rate policy. The 1cycle policy anneals the learning rate from an initial learning rate to some maximum learning rate and then from that maximum learning rate to some minimum learning rate much lower than the initial learning rate. This policy was initially described in the paper Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates.
The 1cycle learning rate policy changes the learning rate after every batch. step should be called after a batch has been used for training.
This scheduler is not chainable.
Note also that the total number of steps in the cycle can be determined in one of two ways (listed in order of precedence):
A value for total_steps is explicitly provided.
A number of epochs (epochs) and a number of steps per epoch (steps_per_epoch) are provided. In this case, the number of total steps is inferred by total_steps = epochs * steps_per_epoch
You must either provide a value for total_steps or provide a value for both epochs and steps_per_epoch.
The default behaviour of this scheduler follows the fastai implementation of 1cycle, which claims that “unpublished work has shown even better results by using only two phases”. To mimic the behaviour of the original paper instead, set three_phase=True.
在实现方法上, Pytorch直接使用python实现,代码位置。
核心代码如下:
def get_lr(self):
if not self._get_lr_called_within_step:
warnings.warn("To get the last learning rate computed by the scheduler, "
"please use `get_last_lr()`.", UserWarning)
lrs = []
step_num = self.last_epoch
if step_num > self.total_steps:
raise ValueError("Tried to step {} times. The specified number of total steps is {}"
.format(step_num + 1, self.total_steps))
for group in self.optimizer.param_groups:
start_step = 0
for i, phase in enumerate(self._schedule_phases):
end_step = phase['end_step']
if step_num <= end_step or i == len(self._schedule_phases) - 1:
pct = (step_num - start_step) / (end_step - start_step)
computed_lr = self.anneal_func(group[phase['start_lr']], group[phase['end_lr']], pct)
if self.cycle_momentum:
computed_momentum = self.anneal_func(group[phase['start_momentum']], group[phase['end_momentum']], pct)
break
start_step = phase['end_step']
lrs.append(computed_lr)
if self.cycle_momentum:
if self.use_beta1:
_, beta2 = group['betas']
group['betas'] = (computed_momentum, beta2)
else:
group['momentum'] = computed_momentum
return lrs
介绍为:
Sets the learing rate of each parameter group by the one cycle learning rate policy
proposed in https://arxiv.org/pdf/1708.07120.pdf.
It is recommended that you set the max_lr to be the learning rate that achieves
the lowest loss in the learning rate range test, and set min_lr to be 1/10 th of max_lr.
So, the learning rate changes like min_lr -> max_lr -> min_lr -> final_lr,
where final_lr = min_lr * reduce_factor.
Note: Currently only supports one parameter group.
核心代码:
def step(self):
"""Conducts one step of learning rate and momentum update
"""
current_step = self.last_step + 1
self.last_step = current_step
if current_step <= self.num_cycle_steps // 2:
# Scale up phase
scale = current_step / (self.num_cycle_steps // 2)
lr = self.min_lr + (self.max_lr - self.min_lr) * scale
momentum = self.max_momentum - (self.max_momentum - self.min_momentum) * scale
elif current_step <= self.num_cycle_steps:
# Scale down phase
scale = (current_step - self.num_cycle_steps // 2) / (self.num_cycle_steps - self.num_cycle_steps // 2)
lr = self.max_lr - (self.max_lr - self.min_lr) * scale
momentum = self.min_momentum + (self.max_momentum - self.min_momentum) * scale
elif current_step <= self.num_steps:
# Annihilation phase: only change lr
scale = (current_step - self.num_cycle_steps) / (self.num_steps - self.num_cycle_steps)
lr = self.min_lr - (self.min_lr - self.final_lr) * scale
momentum = None
else:
# Exceeded given num_steps: do nothing
return
self.optimizer.param_groups[0]['lr'] = lr
if momentum:
self.optimizer.param_groups[0]['momentum'] = momentum
两种实现基于Pytorch,思路基本相同,Pytorch官方实现更加简洁完善,因此参考Pytorch代码进行实现。
API设计为paddle.optimizer.lr.OneCycleLR(max_learning_rate, total_steps, divide_factor=25., end_learning_rate=0.0001, phase_pct=0.3, anneal_strategy='cos', three_phase=False, last_epoch=-1, verbose=False)
形参名div_factor
->divide_factor
, max_lr
->max_learning_rate
。
参数:
- max_learning_rate:训练过程中的最大学习率;
- total_steps:训练总步数;
- divide_factor:初始学习率
initial_learning_rate
由max_learning_rate
/divide_factor
确定,默认为25; - end_learning_rate:训练过程中的最小学习率,应该是一个远小于初始学习率的数。
- phase_pct:提高学习率所需步数占总训练步数的比例,默认:0.3;
- anneal_strategy:学习率退火策略,'cos'或'linear',默认为'cos';
- three_phase:若为
True
,则使用三阶段的调度策略,即学习率先由initial_learning_rate
上升至max_learning_rate
,再下降回initial_learning_rate
,最后下降到min_learning_rate
;若为False,则使用两阶段的调度策略,即学习率先由initial_learning_rate
上升至max_learning_rate
,再直接下降到min_learning_rate
。默认为False
; - last_epoch:可选,上一轮的轮数,重启训练时设置为上一轮的epoch数。默认值为 -1,则为初始学习率;
- verbose:可选,如果是
True
,则在每一轮更新时在标准输出 stdout 输出一条信息。默认值为False
。
仅使用python实现,无需设计底层OP。
由于Paddle的lr_scheduler调用方式与Pytorch区别较大,因此做了修改,修改如下:
- 取消momentum的相关参数,因为不可能通过
lr_scheduler
修改momentum
或者beta
max_lr
参数类型仅为float
, 而非Pytorch中的float or list
,当max_lr
为list
时,Pytorch会为optimizer.param_groups
内的各个元素分配不同的学习率,而paddle的lr_scheduler
并不能获取到optimizer
,因此可由用户在外部自行实现。
实现方法与Pytorch基本相同。
测试考虑的case如下:
paddle.optimizer.lr.OneCycleLR
与numpy
结果的数值一致性- 错误检查:
total_step
,epochs
和steps_per_epoch
都未指定时能正确抛出错误,并且其数值小于等于0时能正确抛出错误; - 错误检查:
anneal_strategy
不在指定范围时能正确抛出错误; - 错误检查:
pct_start
值不在[0,1]时能正确抛出错误;
方案直接依赖python,工期上可以满足在当前版本周期内开发完成。
为独立新增API,对其他模块没有影响
无
无