Prepare Wasmi bytecode, translator and executor for tail-call based instruction dispatch #1101

Robbepop · 2024-07-03T18:41:19Z

This is a rather large working item that will span across multiple PRs.

Currently Wasmi uses an enum based Instruction to represent all of its Wasmi bytecode instructions.

Pros & Cons

The advantage of this approach is that it is a very simple and rusty representation, easy to work with and allows to avoid unsafe Rust as much as possible.
The downside of this approach is that it is extremely dependent on the Rust and LLVM compiler and optimization pipeline to produce great machine code for the executor in the end as past experiences and experiments showed time and time on. The underlying issue is that this representation is too abstract and does not allow for the fine grained control over the layout and dispatch mechanism.

Major Flaw

Furthermore the enum based Instruction representation with its associated data fields cannot be easily combined with explicit tail calls once they are available in Rust (if ever).
The makepad-stitch Wasm interpreter has clearly showed that tail call based instruction dispatch is clearly superior to the enum based approach that Wasmi is using, especially on Apple Silicon and modern CPUs.

Threaded Code

The dispatching technique that we want to use is called threaded code dispatch and is made possible by explicit tail calls amongst others. Fundamentally there are two different kinds of threaded code:

Direct threading: This stores pointers to instruction handlers directly in the bytecode and simply invokes them once the instruction pointer points to them. This is fast and has the least amount of indirections during execution but not very space efficient because a pointer on a 64-bit machine takes up a whopping 8 bytes.
Indirect Threading: This stores simple C-like enum instruction op-codes instead of instruction handler function pointers directly in the bytecode and thus requires an additional indirection to resolve from the C-like enum variant to the associated instruction handler during execution. Some papers suggest that the overhead compared to direct threaded code is insignificant but we should probably test this ourselves. This form is easier to debug and simpler to deal with since the op-codes are still simple integers and not raw pointers.

Due to the above research related question of direct or indirect threaded code it is important to implement the new architecture to easily exchange the used technique in the end.

In case we want to support both styles of threaded code it should be easily possible to design the architecture so that conditional compilation could enable to disable either dispatching mode via a new Wasmi crate feature.

Link to GitHub Gist where I tried out some potential structure on the compiler explorer website to see what assembly is generated: https://gist.github.com/Robbepop/0b254c86dccabc48ae0eb41e89567ea0

Fallback & Performance

The new Wasmi bytecode can be dispatched using the match+loop approach used so far as a fall back when explicit tail calls are not available for an architecture or until they are not available in Rust at all. The execution performance difference should ideally be insignificant from the performance of today's Wasmi bytecode executor.

Tasks

Redefine Wasmi bytecode in terms of a new InstructionEncoder that provides a large API surface to encode all the required Wasmi bytecodes. We should stay as close as possible to the current set while also take into account for the new possibilities of the more relaxed bytecode representation. The new bytecode representation is more relaxed because it will no longer be organized on an 8 byte "instruction word" buffer but on a raw byte buffer. This means the encoding of an instruction is less restricted with respect to the amount of bytes it may use. We want this InstructionEncoder to be usable from both the Wasmi translator as well as the Wasmi translation test suite.
Rewrite the Wasmi translator in terms of the new InstructionEncoder.
Adjust all the translation tests to use the new InstructionEncoder and the new Wasmi translator.
Implement an InstructionDecoder that can be used to decode instructions encoded by the InstructionEncoder. This InstructionDecoder shall offer an efficient API for the Wasmi bytecode executor.
Implement the new Wasmi bytecode executor based on the new InstructionDecoder.

The text was updated successfully, but these errors were encountered:

Robbepop · 2024-09-18T10:55:37Z

With #1152 merged we now have a useful baseline Wasmi bytecode representation with which we could (in theory) test this out once explicit tail calls have been merged into Rust proper.

Robbepop · 2024-09-30T17:21:56Z

This has effectively been implemented in #1152.
With the new Wasmi bytecode and the macro sourced definitions we will be able to more or less simply move to tail call based dispatch once that is available.

Robbepop added the research Research intense work item. label Jul 3, 2024

Robbepop mentioned this issue Aug 2, 2024

WIP: new Wasmi bytecode #1148

Closed

22 tasks

Robbepop closed this as completed Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare Wasmi bytecode, translator and executor for tail-call based instruction dispatch #1101

Prepare Wasmi bytecode, translator and executor for tail-call based instruction dispatch #1101

Robbepop commented Jul 3, 2024 •

edited

Loading

Robbepop commented Sep 18, 2024

Robbepop commented Sep 30, 2024

Prepare Wasmi bytecode, translator and executor for tail-call based instruction dispatch #1101

Prepare Wasmi bytecode, translator and executor for tail-call based instruction dispatch #1101

Comments

Robbepop commented Jul 3, 2024 • edited Loading

Pros & Cons

Major Flaw

Threaded Code

Fallback & Performance

Tasks

Robbepop commented Sep 18, 2024

Robbepop commented Sep 30, 2024

Robbepop commented Jul 3, 2024 •

edited

Loading