Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/engine no std #595

Merged
merged 11 commits into from
Nov 22, 2022
Merged

Feature/engine no std #595

merged 11 commits into from
Nov 22, 2022

Conversation

iamyulong
Copy link
Member

@iamyulong iamyulong commented Nov 18, 2022

More and more non-std incompatible changes have been introduced into the code base.

To prevent further diverge, this PR restores no_std build and adds it to CI.

Copy link
Contributor

@dhedey dhedey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is ready for review yet, but it's not a draft PR, so as I noticed it, I've taken a look.

I approve of the want to restore a no-std build for the engine, but I worry about a few of the details here. Happy to chat if you'd like :)

.build();
pub fn new(_options: InstrumenterOptions) -> Self {
// TODO: limit size
let cache = RefCell::new(HashMap::new());
Copy link
Contributor

@dhedey dhedey Nov 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a definite reversion of behaviour. We want a safe and performant cache implementation here, for use in the node.

Can we instead just fall back to this basic version in the no-std case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better yet, what about creating our own ThreadsafeCache type, and swap out the implementation for either wrapping Moka or a Mutex<HashMap<K, Arc>> depending on if we use alloc or not?

Copy link
Member Author

@iamyulong iamyulong Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutex is not part of the no_std

Copy link
Member Author

@iamyulong iamyulong Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try conditinal compiling first.


use crate::types::*;
use crate::wasm::{WasmMeteringConfig, WasmModule};

pub struct WasmInstrumenter {
cache: Cache<(Hash, Hash), Arc<Vec<u8>>>,
cache: RefCell<HashMap<(Hash, Hash), Arc<Vec<u8>>>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will panic if multiple threads try to lock it at the same time - as can happen in the Node when we are accepting multiple mempool submissions at once.

Can we wrap it in a Mutex instead of a RefCell?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutex is not part of the no_std.

@@ -116,7 +116,8 @@ mod tests {
// RFC 2056
fn assert_all() {
assert_send::<ScryptoInterpreter<DefaultWasmEngine>>();
assert_sync::<ScryptoInterpreter<DefaultWasmEngine>>();
// TODO: make sure engine is multi-thread safe!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a new Radix engine for each transaction and single thread it - it's the ScryptoInterpreter caches we're sharing.

#FearlessConcurrency should mean this is fine? 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the runtime shares pointer to the cache, they may run into illegal state due to memory volatility. Like I said, we need dedicated story to make sure multi-thread work rather pretend to.

Copy link
Contributor

@dhedey dhedey Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I have already verified this - in the very least:

  • The moka cache claims to be thread safe
  • The multiple copies of the engine share the WasmModule which is just (a reference to) data, and from which they instantiate separate / independent WasmInstances

Separately, the only other interaction between engine threads is in the database layer - where RocksDB is threadsafe. (and we ensure there are no commits during execution of other transactions)

Is there anything I've missed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly the wasmer implementation which stores a pointer which could be volatible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's the WasmerInstance which contains the pointer, not the WasmerModule. The WasmerInstance is not shared - it's created for each execution from the (shared) WasmerModule template.

Copy link
Contributor

@dhedey dhedey Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed - Wasmer is written in Rust, and they have good understanding of Send/Sync (It appears even Instance implements send!), so we can trust that if Module is Sync then it is Sync. Indeed - I believe it's essentially a data object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't know what Rust is doing with the data under that pointer though since Rust doesn't know that it's a mutable object potentially being written to across multiple threads.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you're referring to the pointer that we create ourselves. This pointer isn't in the object which is being shared. This pointer is in the WasmerInstance (not shared) not the WasmerModule (shared).

This pointer in WasmerInstance has no affect on whether WasmerModule is Sync.

On a related note, I'm quite surprised that we use runtime_ptr: Arc<Mutex<usize>>, and use usize, not a raw pointer type here, but anyway... I agree WasmerInstance possibly has an unsound impl of Sync (without looking into it further)... But feeling slightly like a broken record, we're not sharing WasmerInstance. And I've double-checked - the WasmerInstance pointer does not point into WasmerModule (which would be unsound for sure if it did). :)

@iamyulong iamyulong force-pushed the feature/engine-no-std branch from 84ef28b to 16dce1f Compare November 21, 2022 11:23
@iamyulong iamyulong force-pushed the feature/engine-no-std branch 2 times, most recently from d7b6cdc to 92966d0 Compare November 21, 2022 12:00
@iamyulong iamyulong force-pushed the feature/engine-no-std branch from 92966d0 to 1a7a7f9 Compare November 21, 2022 12:10
@iamyulong
Copy link
Member Author

It's now ready for another review. @dhedey

Copy link
Contributor

@dhedey dhedey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! 👍

Happy for you to merge as-is or to extract out a separate Cache type.

EDIT: Looks like you still need to fix some failing tests: https://github.com/radixdlt/radixdlt-scrypto/actions/runs/3515933512/jobs/5891823809

indexmap = { version = "1.8.1" }
moka = { version = "0.9.4", features = ["sync"], default-features = false }
lru = { version = "0.8.1", default-features = false }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put lru behind optional too, and only enable it in the alloc case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, we don't gain much from this (tiny compile time of a small crate) by coupling lru and alloc. A slightly better approach is to make it a feature if we really care. But, will punt on this for now. :)

@@ -116,6 +116,8 @@ mod tests {
// RFC 2056
fn assert_all() {
assert_send::<ScryptoInterpreter<DefaultWasmEngine>>();
// TODO: make sure engine is indeed multi-thread safe!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say the engine, do you mean the DefaultWasmEngine? or the RadixEngine itself?

Copy link
Contributor

@dhedey dhedey Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment. Do you have specific worries or things you want to check? Else I think this code comment should probably be removed.

@iamyulong iamyulong merged commit 97a3a61 into develop Nov 22, 2022
@talekhinezh talekhinezh deleted the feature/engine-no-std branch February 21, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants