Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interpreter pauses #85

Closed
Geal opened this issue Apr 18, 2018 · 23 comments
Closed

interpreter pauses #85

Geal opened this issue Apr 18, 2018 · 23 comments
Assignees

Comments

@Geal
Copy link

Geal commented Apr 18, 2018

hello,

I'm currently developing serverless-wasm and I'm wondering how I could implement two features:

  • putting a hard limit on the run time for a function (real time, or number of opcodes)
  • supporting asynchronous networking in host functions. Something like, WASM side calls tcp_connect, host creates the connection, stops the interpreter right there, and resumes once the connection is established

From what I understand, I could proceed like this (for the async part):

  • have some host function return a custom trap to say "WouldBlock" and registers that we're waiting for a specific socket
  • write another version of Interpreter::start_execution that would store the function context when receiving that trap from Interpreter::run_interpreter_loop instead of dropping it
  • once the code calling the interpreter knows the connection is established (or we got an error), modify the function stack to push the return value
  • resume executing from there

It looks like it would be doable, and could be used to implement breakpoints, too.

The other feature, limiting the run time of a function, would probably be easier to implement, with the loop regularly checking that we do not exceed some limit passed as arguments.

Do you think those features would be useful in wasmi?
The serverless-wasm project is in very early stage, and I admit the readme is not too serious, but I really want to get it to a state where it's nice to use, and I need at least the async networking feature to get it.

@pepyakin
Copy link
Collaborator

This might be something that would be useful for wasmi, but before working on the details, let's discuss other, non-invasive, solutions.

So it sounds to me as you trying to achieve async calls with sync interface. And i'm wondering, have you considered using asynchronous interface? That is, tcp_connect returns handle, host creates the connection and calls back to some exported function in wasm to push update with that handle.

As for limiting run time. In blockchains we have a similar problem: we need to limit execution of wasm code by the run time of an average machine that runs the node and do it deterministically. To solve this problem we use a mechanism called gas metering. The idea is simple: every opcode has a cost. Before execution of wasm code we instrument it with gas metering calls. At the beggining of each basic block a call to a special function (e.g. use_gas(i32)) is inserted. The parameter passed in that function contains a sum of the costs of all following instructions in the basic block.
So I wondering, maybe you can also use instrumentation for that? Maybe you can even reuse our gas metering solution with gas cost = 1 for every instruction.

The reason why I'm want to discuss non-invasive solutions is that I believe you don't want to build production-like product in wasmi since, well, wasmi is an interpreter (and will always be) and interpreters are usually slow. So even if we introduce these features into wasmi, I believe, you won't see any such feature in a near feature in other engines. So it might be a dead end.

@Geal
Copy link
Author

Geal commented Apr 19, 2018

the use_gas function is a nice method, thanks!

For asynchronous calls, unfortunately, without a way to pause the interpreter to wait for the event (like a blocking epoll call), waiting for the event means either using a sleep or busy looping.

The speed of wasmi is not a big concern for me right now, as I might investigate JIT based solutions if it ever becomes a problem. I'm putting off that choice for now to take time to experiment, and wasmi is great for that: nice to use, and the code is easy to follow.

I understand the goal of non invasive solutions, you probably don't want to move wasmi in a direction that does not follow your use case.
Looking at the code, I think there would be a way to allow experimentations without introducing new features:

With those changes, it's possible for wasmi users to create a function context and a stack, and rewrite a Interpreter::run_interpreter_loop method to allow pauses, saving a context, etc. What do you think?

@pepyakin
Copy link
Collaborator

For asynchronous calls, unfortunately, without a way to pause the interpreter to wait for the event (like a blocking epoll call), waiting for the event means either using a sleep or busy looping.

What about a JS-like approach? That is, when wasm module doesn't have control over main loop. Wasm module only gets control on some event, such as, init, tcp_connected, etc.

I understand the goal of non invasive solutions, you probably don't want to move wasmi in a direction that does not follow your use case.

It's not like that : ) Indeed, these features might be useful for breakpoints and stacktraces.

I'm putting off that choice for now to take time to experiment, and wasmi is great for that: nice to use, and the code is easy to follow.

Yeah, I'm just worried about the case when you done with experimentation and try to move on some other engine and find out that your model isn't suitable for it.

What do you think?

This might work! But there is a problem, it will expose a bit of inner workings of interpreter. For example, now every stack frame creates 2 (two!) vectors. The obvious optimization would be to flatten all these into the single stack.

When I thought about stacktraces, one possible solution I was thinking of is the following:

Each execution requires some context. Let's call it Interpreter. This context contains data for execution, in particular, execution stack. So in order to execute some function you create the Interpreter and then asks it to execute some particular function. If execution traps then Interpreter returns control to the caller. At the same time, the execution context is left as is for the inspection by some means. So we can provide a mechanism that peaks into this context and generate symbolicated stacktraces.
It sounds like this mechanism could be easily adapted for resuming execution!

@pepyakin
Copy link
Collaborator

Also, there was another idea in a different context:

the most complex part of the interpreter is dispatching and executing instructions (i.e run_instruction). So I'm also wondered is it possible to abstract this code into another crate for allowing different implementations of the runners and linkers (for example, now modules are using Rc's but it's perfectly ok to use the Store model (all funcs, memories, tables, etc all resides in the single store, like desribed in the spec))

@Geal
Copy link
Author

Geal commented Apr 19, 2018

What about a JS-like approach? That is, when wasm module doesn't have control over main loop. Wasm module only gets control on some event, such as, init, tcp_connected, etc.

a callback approach could work. so the interpreter loop would end, but the code would have registered another internal function to be called later with a new interpreter, but the same environment. But that would mean that the host would be able to call arbitrary closures inside the wasm code?

Yeah, I'm just worried about the case when you done with experimentation and try to move on some other engine and find out that your model isn't suitable for it.

That would be on me then, and I'm willing to put in the work if I find a new model :)

At the same time, the execution context is left as is for the inspection by some means. So we can provide a mechanism that peaks into this context and generate symbolicated stacktraces.

That would definitely work, at the same time alllowing pauses and stack traces, but also hiding the innards behind a nicer API. Good idea!

Having the instruction execution in a separate crate would allow even more tooling, like state save/restore and step by step debugging.

@pepyakin
Copy link
Collaborator

But that would mean that the host would be able to call arbitrary closures inside the wasm code?

Sorry, I think I don't follow. The host would be able to call only functions exported from the wasm module, like: handle_event() and maybe init().

That would definitely work, at the same time alllowing pauses and stack traces, but also hiding the innards behind a nicer API. Good idea!

Yah, but this, admitedly, requires more engineering work comparing to your approach.

Having the instruction execution in a separate crate would allow even more tooling, like state save/restore and step by step debugging.

And that might require even more work... 🤔

@Geal
Copy link
Author

Geal commented Apr 19, 2018

do you want me to send a PR to test my proposal?

@pepyakin
Copy link
Collaborator

After thinking a bit I became a little bit more skeptical about this approach.

You see, opening this internals likely would impose some maintenance burden on wasmi. At the same time it also can make supporting such kind of stuff painful due to internal changes...

@Geal
Copy link
Author

Geal commented Apr 19, 2018

so the middle ground would be having the interpreter expose some context when it returns? That still requires making the interpreter public

@pepyakin
Copy link
Collaborator

Another thought: I think your approach is ok, if we make it clear that this APIs are experimental. E.g by hiding it behind the feature. It the simplest way but still will make clients suffer in case of big changes.

so the middle ground would be having the interpreter expose some context when it returns? That still requires making the interpreter public

What context exactly?

@Geal
Copy link
Author

Geal commented Apr 19, 2018

The context you mentioned in:

Each execution requires some context. Let's call it Interpreter. This context contains data for execution, in particular, execution stack. So in order to execute some function you create the Interpreter and then asks it to execute some particular function. If execution traps then Interpreter returns control to the caller. At the same time, the execution context is left as is for the inspection by some means. So we can provide a mechanism that peaks into this context and generate symbolicated stacktraces.

@pepyakin
Copy link
Collaborator

so the middle ground would be having the interpreter expose some context when it returns? That still requires making the interpreter public

Ah, my bad. I shouldn't call it Interpreter, since we already have one Interpreter in runner.rs, and the thing that I mean doesn't have to be exactly like Interpreter since we don't have to publish all the state that Interpreter contains. We can publish only two functions like resume() and symbolicate(&Symbols).

Let's call it EvaluationContext to be more inline with the spec. Sorry for the confusion.

But in general yes, that would be sorta middle ground. Although I'm also good with:

Another thought: I think your approach is ok, if we make it clear that this APIs are experimental. E.g by hiding it behind the feature. It the simplest way but still will make clients suffer in case of big changes.

@Geal
Copy link
Author

Geal commented Apr 19, 2018

alright, I'll try the first method then

@Geal
Copy link
Author

Geal commented Apr 25, 2018

hello!
Checking in after having messed a bit with the code. You can see the differences here: master...Geal:master

The first commit ( f107c64 ) marks some parts as public and adds the invoke_context method (which can be immediately used in run_interpreter_loop).

The second commit appears because I'm now trying to store the interpreter that I paused, and the way the externals are used is a bit annoying.

We have the following:

pub struct Interpreter<'a, E: Externals + 'a> {
  externals: &'a mut E,
}

If I'm running the interpreter until the function ends, having it hold a mutable reference to the externals is fine, because we will drop the interpreter afterwards.
But if I want to make it resumable, I have to keep the externals somewhere, and keep the interpreter still holding the mutable reference. So it looks like a self referential struct.
It could be a better idea to have the interpreter own the externals, and add a method to consume it and give back the externals.
But that would change the API in multiple places, like FuncInstance::invoke or ModuleInstance::invoke_export.

I'm not sure what would be the right solution here, or if all those changes are even desirable. I'm still testing things out, so let's not rush things :)

@pepyakin
Copy link
Collaborator

But that would change the API in multiple places, like FuncInstance::invoke or ModuleInstance::invoke_export.

Yeah and unfortunately that's quite undesirable.

But I think I didn't get it: why do you need to keep Interpreter instance around? Can't we create another one and then provide it with Externals?

Note that it is ok to dispose Externals and then create them again. In fact, when designed that feature was kept in mind.
For example, consider this sandboxing use-case. GuestExternals have a mutable reference to a supervisor's Externals. That's ok since GuestExternals created only for a one call!

@Geal
Copy link
Author

Geal commented Apr 26, 2018 via email

@eira-fransham
Copy link
Contributor

eira-fransham commented Jun 19, 2018

A good pointer for how to design an interface like this is Linux's aio

@ghost
Copy link

ghost commented Jun 21, 2018

I'm experimenting with using WASM as an embedded language in database systems. I was looking for the same features and I'm pleasantly surprised by the non-invasive solutions suggested. @pepyakin Can you point me somewhere I can have a look at the gas metering so I can learn how to use it?

@pepyakin
Copy link
Collaborator

@portstrom Yeah, sure!

In order to inject gas metering into a wasm binary you need basically two things:

  • A module loaded by parity_wasm.
  • And a set of rules how to count gas.

Then you take a pwasm-util crate and call it's pwasm_utils::inject_gas_counter to do the injection. That's it : )

The example code can be found here.

However, I have a little doubt that you actually need it. I think that way because gas metering might introduce some unwanted complexity and overhead in order to provide its deterministic guarantees. Have you assessed the use of a plain timeout for this use-case?

@ghost
Copy link

ghost commented Jun 21, 2018

@pepyakin Thank you! I had to get an old version of pwasm-util so it uses the same version of parity_wasm as wasmi, but after that it works great.

For my use case execution time is limited by real time, not by counting instructions. I did it so that after every one million gas, the host function checks the elapsed time. This doesn't work however when instructions are executed in a sequence all the time without switching block, assuming calls to the gas function are only inserted at the beginning of each block. The theoretically correct solution would be to insert such calls throughout the block, each call reporting no more than one million gas, but in practice I think there may never be any code that has over a million gas in a single block, so I just made the host function automatically reject any block that adds more than a million gas.

@ghost
Copy link

ghost commented Jun 21, 2018

@pepyakin I don't see a good way to do with a timeout without gas metering. I would have to run each module in a separate process and kill the process after the timeout. That's not what I want to do.

@ghost
Copy link

ghost commented Jun 21, 2018

Gas metering apparently makes my function take almost three times as much time to run (optimized build). Would probably be easy to optimize by having a special opcode for gas metering instead of calling a host function.

@pepyakin
Copy link
Collaborator

pepyakin commented Jul 9, 2018

Yeah this is a downside of gas metering approach, it definitely has some overhead.

@Geal interpreter pauses feature has been landed in #110 !

@pepyakin pepyakin closed this as completed Jul 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants