-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast PC1 branch seems to go belly up for me #1320
Comments
Digging a bit deeper, listing
Edit: No idea what Clover is or why you'd all of a sudden try to access it mind you. |
Thanks, @karalabe. I think I know what is going on here, and it looks like it may be orthogonal to the intended feature here. It looks like the way we absorbed an unrelated fix caused some previously unreleased code (neptune 1.2.0) to make it in along with the (unrelated to this work) fix (neptune 1.2.1). I think we can probably address this particular issue by going back to the previously-used version of neptune (1.1.1) and applying the needed fix to produce 1.1.2 — leaving out the apparently problematic 1.2.0. @magik6k @cryptonemo I can prepare a suitable neptune 1.1.2 tomorrow, which may then require pinning the version in the API/FFI. |
Towards a fix to the main issues noted here: #1322 |
@karalabe I have an idea. Can you please run |
I think I was able to reproduce (in a different context) and fix the problem you're having: argumentcomputer/neptune#60. I can create a new patch version 1.2.2, but that will still probably require a new release of API and/or FFI (cc: @cryptonemo) before the lotus test branch can use it. |
I've just upgraded my GPU and updated the driver, so not sure I can get you the old smi. |
It's okay. I think there's a good chance the patch above will fix this problem (not sure about what comes next) once it makes its way into the branch you're using. |
This was fixed on lotus master. |
I.e. https://app.slack.com/client/TEHTVS1L6/C0179RNEMU4/thread/C0179RNEMU4-1603007653.135600
My hardware setup is: EPYC 7402P, 512GB RAM, PCIe x4 NVMe SSD, Nvidia 1070. Ubuntu 20.04
I've switched to
feat/proofs-5.2.3
in lotus to try out the faster PC1 but it failed in a lot of different ways for me on PC2.Originally I ran with
FIL_PROOFS_USE_MULTICORE_SDR=1
but seen super weird behavior so I dropped this env var altogether to keep the change surface a bit smaller. My worker configs were:These configs seem to be stable on the 0.10.2 release branch and never fail.
One immediate error that appeared newly is
"msg":"Cannot get device list for platform: Clover!"
The failure that choked all my PC2s were:
The text was updated successfully, but these errors were encountered: