Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnknownOpcode(192) again #3192

Closed
2 tasks done
nazar-pc opened this issue Feb 2, 2024 · 23 comments · Fixed by #3777
Closed
2 tasks done

UnknownOpcode(192) again #3192

nazar-pc opened this issue Feb 2, 2024 · 23 comments · Fixed by #3777
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@nazar-pc
Copy link
Contributor

nazar-pc commented Feb 2, 2024

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

Yes, that same famous issue again.

Using Substrate derived from 127b9be and nightly-2024-01-19 version of Rust I'm still getting "UnknownOpcode(192)" with modern version of LLVM.

As such I believe the issue is still very much present despite #1755 being resolved by building standard library from source.

I believe some long-term solution needs to be found for this issue at last.

cc @koute

Steps to reproduce

No response

@nazar-pc nazar-pc added I10-unconfirmed Issue might be valid, but it's not yet known. I2-bug The node fails to follow expected behavior. labels Feb 2, 2024
@bkchr
Copy link
Member

bkchr commented Feb 2, 2024

Are you sure that you use the latest wasm-builder? Because it should recompile std to prevent this, but maybe there is some new unknown op code.

@nazar-pc
Copy link
Contributor Author

nazar-pc commented Feb 2, 2024

It is the same opcode 192 from my understanding. And yes, I am sure wasm-builder from mentioned commit is what I use and it does contain mentioned fix with std recompilation.

@bkchr
Copy link
Member

bkchr commented Feb 2, 2024

Okay, thank you for confirming!

@s0me0ne-unkn0wn
Copy link
Contributor

@koute maybe it's time to consider enabling sign-ext... Again?
It's pretty much the de-facto standard throughout the Wasm ecosystem nowadays.
We could even do that deterministically, the same way it's implemented for bulk memory proposal now, although that would require more time to introduce a new executor parameter, make a release, wait for the nodes to update, etc.
I understand the whole Wasm thing is in a "just support it" state now, but that's something really annoying.

@koute
Copy link
Contributor

koute commented Feb 5, 2024

Yeah, this shouldn't happen anymore.

@nazar-pc Could you please post exact reproduction steps?

I believe some long-term solution needs to be found for this issue at last.

Yes; the long-term solution is most likely getting rid of WASM altogether. (:

@koute maybe it's time to consider enabling sign-ext... Again?

Well, we could do that; if you want to, be my guest! We'd need to:

  1. enable the signext feature in parity-wasm,
  2. to preserve current behavior add an explicit check when loading .wasm blobs that would reject blobs containing those instructions,
  3. expose a flag in the executor crate to allow it to be enabled,
  4. add an on-chain knob that could be atomically enabled and link it to the flag,
  5. write a Polkadot RFC and get it accepted (unless we make it parachain only - which technically doesn't need RFCs - but it probably doesn't make sense to make it parachain-only),
  6. release a new client version,
  7. wait half a year for people to upgrade,
  8. finally enable it.

But this isn't really a proper long-term fix per-se; it'd be medium-term at most. Sure, it will fix this particular case, but that's only until yet another new WASM feature is enabled by default in LLVM, and then we're going to have to do this song and dance once again.

@nazar-pc
Copy link
Contributor Author

nazar-pc commented Feb 5, 2024

I've just tried to compile subspace-runtime from https://github.com/subspace/subspace with LLVM 16/17 installed rather than LLVM 14/15 and got the issue.
Here is CI run that uses it as a dependency: https://github.com/nazar-pc/space-acres/actions/runs/7756289113/job/21153293189#step:16:1854

@koute
Copy link
Contributor

koute commented Feb 5, 2024

Hmm... compilation of subspace-runtime works for me (cloning subspace repo and running cargo check -p subspace-runtime), so I'm not exactly sure how to reproduce your issue. The "with LLVM 16/17 installed" part is already true - the nightly-2024-01-19 toolchain you have in your rust-toolchain.toml uses LLVM 17 by default.

Is there a way to reproduce this locally?

@nazar-pc
Copy link
Contributor Author

nazar-pc commented Feb 5, 2024

The "with LLVM 16/17 installed" part is already true - the nightly-2024-01-19 toolchain you have in your rust-toolchain.toml uses LLVM 17 by default.

This has nothing to do with Rust, using new Rust toolchain is fine, I was talking about LLVM version installed on the system. I typically use https://github.com/KyleMayes/install-llvm-action in CI to install LLVM 15 (specific version for Substrate-based runtime to compile successfully), upgrading that to LLVM 16 or 17 breaks things.

Now that I though about it some more I think it is related to the fact that we probably have some C dependencies in the stack that are compiled in WASM, not just Rust code.

@koute
Copy link
Contributor

koute commented Feb 6, 2024

Now that I though about it some more I think it is related to the fact that we probably have some C dependencies in the stack that are compiled in WASM, not just Rust code.

Yes, that does sound like what's most likely happening here, in which case you should be able to easily fix it by just telling clang to not emit non-MVP instructions.

@nazar-pc
Copy link
Contributor Author

nazar-pc commented Feb 6, 2024

I'm not calling Clang though. I call cargo build and it calls cargo build again to compile WASM and that in turn calls build script of some dependency that likely uses https://github.com/rust-lang/cc-rs to actually build stuff. Is there maybe some environment variables that substrate-wasm-builder should set for this to not happen?

@nazar-pc
Copy link
Contributor Author

Looks like rust-lang/cc-rs#268 will need to happen for C dependencies to account for target-cpu=mvp

@nazar-pc
Copy link
Contributor Author

Doing std::env::set_var("CFLAGS", "-mcpu=mvp"); before calling builder helped as a workaround for now.

@dnjscksdn98
Copy link

Doing std::env::set_var("CFLAGS", "-mcpu=mvp"); before calling builder helped as a workaround for now.

I've met the same issue while building a substrate node. So I tried the workaround you've mentioned, and yes, it did work for me also. Is the issue handled? Or should I just stay with the workaround for a while

@nazar-pc
Copy link
Contributor Author

Well, #3777 was supposed to close this, but it didn't, at least with C dependencies I still need std::env::set_var("CFLAGS", "-mcpu=mvp");.

@bkchr
Copy link
Member

bkchr commented Sep 30, 2024

Really? This pass should rewrite the op codes...

@nazar-pc
Copy link
Contributor Author

Well, I did quick check with cargo clippy and got this:

Failed to deserialize /home/nazar-pc/.cache/cargo/target/debug/wbuild/auto-id-domain-test-runtime/auto_id_domain_test_runtime.wasm: UnknownOpcode(192)

So yeah...

@s0me0ne-unkn0wn
Copy link
Contributor

/home/nazar-pc/.cache/cargo/target/debug/wbuild/auto-id-domain-test-runtime/auto_id_domain_test_runtime.wasm

It's obviously a cached one, did you take care to invalidate the cache?

@nazar-pc
Copy link
Contributor Author

It is certainly not cache, I just nuked target and it still happens

@bkchr
Copy link
Member

bkchr commented Sep 30, 2024

@nazar-pc you are 100% sure that you are using #3777?

@nazar-pc
Copy link
Contributor Author

I triple-checked and yes, I do. Our current downstream version is derived from stable2409 release, which certainly contains #3777.

@bkchr
Copy link
Member

bkchr commented Sep 30, 2024

How can I reproduce it?

@nazar-pc
Copy link
Contributor Author

Some runtimes seem to be fine, but not this one 🤔

@nazar-pc
Copy link
Contributor Author

@bkchr let me know if you had a chance to take a look (or even planning to)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants