Skip to content
matanster edited this page Mar 29, 2018 · 4 revisions

The default learning algorithm is a variant of online gradient descent. The main difference from vanilla online gradient descent is fast and correct handling of large importance weights (in this regard, see https://arxiv.org/abs/1011.1576. Various extensions, such as conjugate gradient (CG), mini-batch, and data-dependent learning rates, are included.

Clone this wiki locally