Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covariance different from scipy curve_fit with the same data #70

Closed
kain88-de opened this issue Jun 14, 2018 · 8 comments
Closed

Covariance different from scipy curve_fit with the same data #70

kain88-de opened this issue Jun 14, 2018 · 8 comments

Comments

@kain88-de
Copy link

gist

I'm trying to fit an exponential decay of an autocorrelation function with LsqFit.jl. The optimized value I obtain is correct and is the same that I get with the scipy. But the reported covariance and associated standard error is unreasonable large with LsqFit.jl. Scipy reports a value of ~1e-5 and LsqFit has a value of ~1. I would tend to believe scipy more because visually the data I have fits perfect to a single exponential decay with a very small error. See gist for the code.

I'm using the latest release version of LsqFit (4ecb0ec). I'm not sure if this is fixed in the current master branch. Mostly because I'm new to julia and I do not know how to best test using a checkout of the master branch.

@pkofod
Copy link
Member

pkofod commented Jun 14, 2018

That is indeed a very large standard error for the data given. @iewaij will be working to improve some functionality in this package over the summer as part of google summer of code. If there's a bug in the covariance code, this would be top priority.

Thanks for the bug report. We'll trace this 🐛 and squash it :)

@iewaij
Copy link
Contributor

iewaij commented Jun 14, 2018

Thanks for the bug report!

The bug seems related to the weight parameter. Delete the weight parameter and run fit = curve_fit(func, t, log.(res_norm), [1., ]), the covariance decreases to 8.75554e-6.

Another related issue is #69. I'll have a look on this.

@kain88-de
Copy link
Author

Thanks for the quick response. The workaround to leave out the weights parameter works for me too.

@rsturley
Copy link

This issue is actually two issues:

  • The correct interpretation of the weighting vector. Right now, the vector weighting as applied to the fits is 1/sigma, where sigma is the standard deviation. This is obvious from the levenberg_marquardt code where sum(abs2, f(p)) is what is minimized. Since f(p) = wt .* ( model(xpts, p) - ydata ) in the curve_fit routine with vector weights, the weight must be 1/sigma. The current code is doing the minimization correctly with this definition of wt.

  • The routine estimate_covar is calculating the covariance matrix in correctly for the case of vector weights. The functions g(p) calculated in these cases are scaled by the weights rather than being
    the covariance matrix for the unweighted residuals. This means the correct formula should be covar = inv(J'*J) rather than covar = inv(J'*Diagonal(fit.wt)*J) as it is now.

This can be checked quite readily. One can do multiple fits to a curve with difference random noise added each time with known uncertainty in the data. The standard deviation of the fit parameters from multiple fits should be approximately equal to the estimated standard error computed by the standard_error function. The attached file has the results showing that the revised formula for covar I suggested above is the correct one.

LsqFitCovariance.pdf

@iewaij
Copy link
Contributor

iewaij commented Aug 21, 2018

Hi @rsturley thanks a lot for pointing out the issues and the experiment.

I agree that the current code implementation for the weights is wrong. For weighted least squares, the objective function should be sum(f(p).^2.*w), if we want to keep using sum(abs2, f(p)), then we should pass square root of weights, i.e.`f(p) = sqrt(wt) .* (model(xpts, p) - ydata). A possible fix is #72. There's another discussion in #69.

I think the current covariance matrix is still correct under the implementation above. There's a documentation explaining the covariance computation using linear approximation.

@yanyuechuixue
Copy link

It seems this issue still exist?
I use Scipy, Matlab, and LsqFit with same data.
Scipy and Matlab return the same covariance matrix, while LsqFit gives a different covariance matrix.

@pkofod
Copy link
Member

pkofod commented Dec 18, 2018

It seems this issue still exist?
I use Scipy, Matlab, and LsqFit with same data.
Scipy and Matlab return the same covariance matrix, while LsqFit gives a different covariance matrix.

Could you try master Juila? Also if you could post the exact code for Scipy and Matlab it would be easier to track down this problem.

@pkofod
Copy link
Member

pkofod commented Dec 20, 2018

Having played around with it a bit today, I'm quite certain that master LsqFit now has the correct behavior. The weights are either no present, present as a vector (in which case they're inverse variances) or a matrix (in which case they're inverse variance-covariance matrices). This is also documented in the README. IF you provide "stupid" weights such as in the thread starter's gist, it will give the correct parameter values but since the weights are not the inverse variances, you will get "wrong" (but not really, because you said that was the inverse variances) standard errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants