-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subtle solution difference upgrading from Julia v1.6.1 --> v1.7.1 causes my iterative solver to fail #133
Comments
Floating-point computations can depend on a variety of environmental factors like the compiler versions, math libraries, BLAS libraries, etc. I don't find the change in convergence behavior particularly surprising. It would be a good exercise to trace through the code in ECOS to see what causes the divergence, but my guess is that we won't find a bug here. |
Julia's BLAS changed between 1.6 and 1.7, but ECOS_jll has no external dependencies so I'm not sure that's the problem. There were also changes to the random number generation. Did you check that your Julia code is deterministic under Julia 1.6 and 1.7? The most likely culprit is that you aren't passing bit-for-bit identical models to ECOS between Julia 1.6 and 1.7. |
@odow that's a good callout, maybe the inputs to ECOS are not exactly the same if my code produces slightly different outputs due to the BLAS change. Even if ECOS doesn't depend on it, my external code that wraps ECOS probably does. Where in Julia is BLAS used? Is there a list, or some other way to know, which functions call it? |
BLAS will probably be used if you call any linear algebra-related calls. There's no easy way to isolate where and if it is called. I think you should focus on the underlying issue: your code should be robust to these differences. You should not expect to have identical performance when changing versions or machines. |
Closing because this doesn't seem like an issue with ECOS and there isn't any thing actionable to do here. If you can come up with a reproducible example demonstrating an issue in ECOS, please re-open. |
Hello team, thanks for developing ECOS.jl. I'm writing a new package for sequential convex programming, it's called the SCP Toolbox. I struck on a very subtle issue in ECOS related to a Julia version upgrade from v1.6.1 to v1.7.1. Even though all installed package versions don't change, the behavior of ECOS changes very slightly. Because my package is iterative, it seems that a "numerically not-so-stable" unit test in my package fails simply due to the version upgrade.
First things first, I am on Ubuntu:
The Julia version where things work is v1.6.1, and where things break is v1.7.1. The unit test under question is:
https://github.com/dmalyuta/scp_traj_opt/blob/bugfix/ecos-numerical-error/test/runtests.jl#L78
In particular, the test that fails occurs here:
https://github.com/dmalyuta/scp_traj_opt/blob/bugfix/ecos-numerical-error/test/examples/rendezvous_3d/tests.jl#L215
You can run the code for yourself by downloading the repository and running
] test
in the Julia REPL for v1.6.1 and v1.7.1. I have also attached directly the stdout from testing both versions. You can see that the iterations follow each other very closely up until iteration 13 (of my SCP algorithm that is, not ECOS' interior point method iteration). At that point, ECOS under Julia v1.6.1 stops short with "Close to OPTIMAL" status whereas in v1.7.1 it actually finds the OPTIMAL solution. This divergence in behavior unfortunately causes the v1.6.1 version to achieve OPTIMAL on iteration 14, while v1.7.1 stops short with "NUMERICAL PROBLEMS" on iteration 14.I think that this is an interesting bug because the package versions remain the same for both runs, only the underlying Julia language is "newer". In optimization we obviously never want to see a situation where an upgraded environment suddenly changes convergence behavior.
If you need to know something else about this issue, please let me know.
stdout_julia_v161.txt
stdout_julia_v171.txt
The text was updated successfully, but these errors were encountered: