-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parca output differences between machines #1154
Comments
Narrowing this to Would it be straightforward to make a small test case (like the dot product test) for quicker hypothesis testing? We could test newer releases of SciPy and (if relevant) NumPy and OpenBLAS. FWIW, the latest SciPy embeds OpenBLAS 0.3.9 while the latest NumPy embeds OpenBLAS 0.3.17. It looks like SciPy embeds the MINPACK Fortran source code and those sources haven't changed in 3 years. A difference could be due to the Fortran compiler and its compilation switches. Are you running on the lab's compute nodes rather than whatever CPUs are in the newest nodes? A possible workaround could be to run in a container on Sherlock. Singularity on Sherlock's CentOS might actually be able to run a Docker image, also there's a tool |
Actually, I was incorrect and don't think
This would be a great idea. We could save one of the cached KM files that should have inputs to the km_loss_function as an easy test. Not sure how easy it will be to reproduce on a smaller test case.
It was on the newest compute node. It looks like they took away the old ones so now we only have the one new node. Maybe using jax or another package for these functions and Jacobians would be better and consistent across environments? |
... per code review feedback. And bump the validation data tasks up to priority 12 to support their downstream tasks, now that we know the added dependency links fixed the bug. Any ideas what could cause the exception `ModuleNotFoundError: No module named 'tmpqliqwytx.m60156961fb5c4d3d33cb0876d617bf81987f03cf4a6533e8b2ceef71f39f139c'`? There were warnings about .aesara cache files "gone from the file system". Bugs in Aesara's caching when run from multiple processes? This (in addition to Issue #1154) is more incentive to try using JAX instead or updating to a newer Aesara release. (I'll revert `ecoli-pull-request.sh` before merging.)
... per code review feedback. And bump the validation data tasks up to priority 12 to support their downstream tasks, now that we know the added dependency links fixed the bug. Any ideas what could cause the exception `ModuleNotFoundError: No module named 'tmpqliqwytx.m60156961fb5c4d3d33cb0876d617bf81987f03cf4a6533e8b2ceef71f39f139c'`? There were warnings about .aesara cache files "gone from the file system". Bugs in Aesara's caching when run from multiple processes? This (in addition to Issue #1154) is more incentive to try using JAX instead or updating to a newer Aesara release. (I'll revert `ecoli-pull-request.sh` before merging.)
Parca output can vary between machines with our current environment and installation. I am able to run the parca without issue locally with 86cfdc5 but the PR builds failed (#1153). Comparing output between local and sherlock parca runs shows that the first difference appears due to the RNA degradation fitting.
Looking at the
stats_fit
difference, there is no difference between the value beforefsolve
(egLossKm
) so I would expect the difference comes fromfsolve
in scipy.fsolve
uses MINPACK compiled code so my guess is that there is a difference between these installations on different machines.The text was updated successfully, but these errors were encountered: