-
-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested derivatives do not work for functions with arrays as argument #148
Comments
Another related example, using
Actual code:
This code gives the following: |
As a side info, the hessian of the
Returns the correct result:
|
Some really good news. If I use 2 separate params the hessian works. Next is to adjust this to work for arrays.
Result:
|
Higher order derivatives of functions with a single argument, work correctly:
|
I tried to replicate this approach with the following code, but yields incorrect results:
|
Getting closer. I have manually created the corresponding graph for the gradient and the hessian. The rrules used are correct. There is something wrong with the tracker algo in the "nested" gradients. The extra graph to record the derivatives is perhaps not created correctly.
Gives the correct result:
|
I have created a function to print the graph under a specific node: One can notice the Tracker does not record the methods performed, but the pullback of these methods. This is because we only need the pullbacks when we do back-propagation in the graph.
|
Thanks for digging! As you can see this package doesn't get a lot of attention, but fixes are very welcome. I do not know what's going wrong in these cases, I never thought much about how this package handles second derivatives. There is a way to mark rules as only suitable for first derivatives, which the |
This is the graph of the example which works correctly. Looks clean.
|
I have reviewed the back-propagation algo for both simple and nested gradients and everything seem just correct. To summarize, 3 actions are needed to improve the robustness and ease of development of this package:
|
Calling the |
@ToucheSir you are right, we have to store the pullback, not the original function. Integrating ChainRules is pretty easy: https://github.com/MariusDrulea/Tracker.jl/blob/master/src/Tracker.jl#L91 Next is to:
The following first order derivatives works via ChainRules:
Prints:
|
Not that easy, son. It is easy only for the first order derivatives. I do have to further track operations performed by these first order derivatives, such that the second order derivatives can be called. |
I'm currently stuck as I don't know how to deal with the ChainRules.rrule(s) here. The logic in Tracker is to define an untracked forward pass and a tracked pullback and the AD engine will do the rest.
What ChainRules offers:
This would yield
@ToucheSir, @mcabbott Any idea? Or ping somebody who can help? |
I'm not sure I understand, why shouldn't it be |
The tracking of The definition of
|
I have checked Autograd.jl engine and also HIPS/autograd engine. These engines calls the differentiation (rrules) in the backward pass. So does Tracker.jl. The right thing to do here is to I think we can currently achieve the following:
Sample code for item 2:
|
Status:
|
See the following code.
Pseudocode + explanations:
Actual code:
The value of
gg
is[-2.0, -0.5]
instead of[1, 1]
The text was updated successfully, but these errors were encountered: