Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTML optimizer #48

Merged
merged 4 commits into from
Dec 7, 2018
Merged

FTML optimizer #48

merged 4 commits into from
Dec 7, 2018

Conversation

zoq
Copy link
Member

@zoq zoq commented Nov 8, 2018

Implementation of "Follow the Moving Leader in Deep Learning" by Shuai Zheng and James T. Kwok.

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. The paper was good; I had not previously known of the connection of Adam to FTL-type techniques.

I think we should add documentation before merge, but I am writing that in all the PRs so maybe I sound like a broken record now. :)

* @endcode
*
* For FTML to work, a DecomposableFunctionType template parameter is
* required. This class must implement the following function:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this documentation across a lot of optimizers, but the addition of function_types.md probably means we can reduce and centralize a lot of it. Do you think that we could replace this with something like... FTML requires a separable differentiable function to optimize (see <url>). The only problem with that is that it's not clear what URL to use there. I suppose ideally we'd like to point a user at the website documentation, but, the URL there could change. So I am not sure what the best choice is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually for this one I wonder if it would be best to open another issue instead of handling it in this PR.

const size_t batchSize = 32,
const double beta1 = 0.9,
const double beta2 = 0.999,
const double eps = 1e-8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for Padam, do you think we should adjust this documentation to point out that it's to avoid divisions by zero? (Or something like that.)

Also for consistency here I guess we should call it eps since there is a method Epsilon().

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me. Feel free to merge whenever you are happy with the code. If you want to add something to HISTORY.txt go ahead, otherwise I will do as part of the next release process.

@zoq zoq merged commit 630803a into mlpack:master Dec 7, 2018
@zoq
Copy link
Member Author

zoq commented Dec 7, 2018

I'll update HISTORY.txt, once the other PR is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants