Add Webassembly backend #2566

bjorn3 · 2021-01-09T14:19:34Z

Feature

Add a Cranelift backend that compiles to Webassembly.

Benefit

This would allow a compiler to run in the browser and compile code to run in the browser.

Implementation

No plan yet

Alternatives

Not adding a Webassembly backend.

bjorn3 · 2021-01-09T14:22:20Z

One of the possible usecases would be to compile rustc to webassembly and then let it produce webassembly modules using cg_clif.

abbec · 2021-04-10T18:50:57Z

This would be awesome, another use case might be implementing custom compiler frontends for custom languages that could then compile to wasm!

birbe · 2021-08-05T04:57:06Z

I'm working a Java Virtual Machine implementation in Rust and one of my plans is to write a WASM backend along with a general JIT implementation using Cranelift, and if everything were unified to just be Cranelift that would be great. I might give creating a WASM backend a shot, would benefit my project as well as Cranelift

teymour-aldridge · 2021-12-30T16:46:25Z

I would be interested in implementing this!

I had a look, and I'm not sure where the correct place to implement this would be. My thought was in cranelift_codegen::isa, however the MachBackend trait seems to reply upon the existence of registers in the target architecture (which makes sense for other architectures, but WebAssembly is sort-of a stack machine). Would it make sense to emulate registers by using WebAssembly locals, or is that a bad idea?

bjorn3 · 2021-12-30T17:25:57Z

Wasm is a fundamentally different architecture from all other supported architectures in that it is a stack machine rather than a register machine, it uses structured control flow rather than arbitrary jumps, allows more than one memory, ... For this reason I think you should avoid using the existing backend infrastructure. One possible implementation method would be to have an implementation of the Module trait from cranelift_module in a separate crate and then write the lowering from clif ir to wasm bytecode in this crate.

cfallin · 2021-12-30T20:40:28Z

Indeed, this would best be structured as a separate framework that consumes Cranelift's IR, I think.

There are lots of resources you could look at for the various parts; the Stackifier algorithm to start with, to recover structured control flow, and then some scheme to map operators back to stack code using locals where necessary (a sort of combined regalloc and instruction-scheduling problem: reordering operators can let you use the stack directly).

We'll need to decide what we do about runtime-specific implementations of Wasm memories, tables, globals, and imports as well (EDIT: when using the cranelift-wasm frontend). Traditionally cranelift-wasm has these bits filled in by the "environment", and our embedding (Wasmtime, Lucet, SpiderMonkey, something else) fills in some IR that uses e.g. machine-native pointers to implement Wasm memory/table accesses. If we compile back to Wasm, should a table just become a table, and a memory become a memory? Probably, but this will require some careful thought in the frontend/environment too. Possibly we add new IR operators (primitives) and just lower table.get 1-to-1 to this primitive, then back to table.get in the backend.

FWIW, I have a little side-project (waffle) that is attempting to be a Wasm-to-CFG-of-SSA-to-Wasm framework, which might be useful to look at to see what it takes to compile SSA IR with blockparams back to Wasm. (It kind of works: it roundtrips a hello-world, but has some bugs beyond that. Side-project, no guarantees, etc!)

birbe · 2021-12-30T22:13:58Z

There are lots of resources you could look at for the various parts; the Stackifier algorithm to start with,

https://docs.rs/relooper/latest/relooper/ might be a good way to easily implement this (I believe it actually implements the stackifier algorithm even though it's called relooper)

teymour-aldridge · 2022-01-01T15:34:22Z

Thank you for the advice!

There are lots of resources you could look at for the various parts; the Stackifier algorithm to start with, to recover structured control flow, and then some scheme to map operators back to stack code using locals where necessary (a sort of combined regalloc and instruction-scheduling problem: reordering operators can let you use the stack directly).

I'm not sure how regalloc could be used to translate operators to stack code. Would you mind elaborating (a link to something explaining the idea would be great if you have one).

My thoughts were that this should be done by recursively inserting WebAssembly instructions to compute Values.

So something like:

function instruction_code()
  for each operand
    instruction_code(operand)
  code to push result onto stack (apply operator and then push)

cfallin · 2022-01-01T19:18:17Z

That scheme is a good start, and works for tree-shaped code, but not DAG-shaped code: as soon as an operator (SSA value) is used more than once, you either need to recompute it or store it in a local and then reload it at each use-site.

The use of locals to store SSA values is the register allocation problem -- with a range of solutions, from a "one local (register) per SSA value" approach up to more sophisticated approaches that reuse locals. Given Wasm engines' limits on locals, and given the way that baseline Wasm compilers allocate separate stack slots for each local (leading to huge stackframes if we are not frugal with locals), we probably want to do something better than one-local-per-value.

The reason I described this as a "combined regalloc / instruction scheduling problem" is that the two are intertwined. Once we've decided to use locals to carry values, we could do the simplest thing and emit local.get for every operand, then the operator, then local.set. But that's also inefficient; we want to use the stack when we can. So the order of operators matters. Your scheme (recursive value generation) is actually doing implicit code-motion in order to take advantage of stack dataflow (by sinking operators to use-sites), which is great, and what we want. But we have to be careful about it for two reasons:

Some operators have side-effects, so we have to be careful about moving them. We can slide them over pure operators but not other side-effecting operators. (Trapping is a side-effect too, so this is not just for loads/stores/calls.)
When a value is used more than once, we need to choose a location to compute it that dominates all of its use-sites. We can't, for example, just generate it onto the stack on demand the first time it's used then save it in a local for later times -- consider when two branches of an if-else use a value computed before the control-flow split.

In the most general form of the problem, we could also consider rematerialization: some operations (such as, e.g., i32.const 0) are trivial to compute and so it would probably not be a good idea to compute this once at the top of the function (GVN will often dedup constants like this) and store it in a local thereafter. Rematerialization can be considered part of the regalloc problem (because a regalloc might not want to spill and remat is then an option) and some regalloc implementations can do this; one can also do it in a more ad-hoc way in the code one generates prior to allocating registers.

A scheme that might work is to "treeify" the operands before generating code. Basically each operand of an operator could then be:

SingleUse(Value): We are the only use of the value, and the operator that produces it is pure, so it will be generated in-place on the stack (your scheme);
NormalUse(Value): We are one of several uses of the value; it should be generated into a local at its original location in the program, then loaded here; or
Rematerialize(Value): We replicate the operator here. Used for pure ops like iconst only.

So one would generate ops in original program location if used by a NormalUse elsewhere, and otherwise only at the use-site. This would result in code that then refers to values, which we can subsequently do register allocation over to assign to locals.

There's not really a good document on this that I know of; it's just applications of general concepts (regalloc, remat, instruction scheduling) to a specific and somewhat odd target. We're inventing as we go here. That said it'd probably be good to understand how e.g. LLVM's Wasm backend does this; I haven't really looked at the dataflow side (just the stackifier).

Anyway hopefully this helps!

teymour-aldridge · 2022-01-02T21:18:24Z

Thanks so much!

teymour-aldridge · 2022-01-06T14:51:52Z

I have started to implement something in: https://github.com/teymour-aldridge/cranelift_codegen_wasm

Unfortunately there are quite a few bugs and many things (e.g. data objects, instructions) are not yet supported.

cfallin · 2022-12-14T03:37:07Z

Since I chanced upon this issue again, I'll leave a note that my waffle crate now has a working Wasm compilation pipeline from an SSA-based IR that is very similar to Cranelift's IR. It could serve either as the model for, or as a codegen library for, a WebAssembly backend for Cranelift. It currently requires reducible control flow (will panic when given an irreducible CFG) but this is "just" an implementation question, as the API is based on a standard CFG. The crate has been fuzzed with roundtrip + differential execution and is in a fairly robust state (can roundtrip complex Wasms like SpiderMonkey).

The main challenge that remains is in the semantic gap between Wasm and CLIF: lowering loses information about tables, heaps, globals and the like, and we would need to "lift" CLIF's raw memory accesses back to Wasm-level concepts. That's maybe not so bad though: one could use a "null" environment when lowering the Wasm, leaving heap accesses to memory zero as-is (address computation is the identity function), and then compile accesses to raw memory in CLIF back to accesses to a single Wasm memory. Function pointers (taking the address of symbols) will require some care. Also, CLIF is not exactly 1-to-1 with Wasm opcodes, especially the SIMD ones. But, doable. This is probably a 1-2 month project for someone at this point.

cfallin · 2022-12-14T03:42:00Z

(I'll note too that some of my thoughts above are specific to a "Wasm roundtripping" context, which probably is actually the least interesting application; serving as a backend for something that starts with a native-code model, like cg_clif, is more interesting, and for that the mappings are a little simpler. One raw address space goes to one Wasm memory, and all function pointers are indices into one table, with entries populated as we observe function-address operators. I guess there is just the question of "relocations" for external symbol references.)

bjorn3 · 2022-12-14T12:07:06Z

I just took a look at waffle, but it seems basically everything necessary to build a module is private. Would it make sense to move the wasm frontend to a separate crate? This way everything the wasm frontend and other frontends need is forced to be public.

cfallin · 2022-12-14T16:08:19Z

I just took a look at waffle, but it seems basically everything necessary to build a module is private. Would it make sense to move the wasm frontend to a separate crate? This way everything the wasm frontend and other frontends need is forced to be public.

Yes, for sure; the crate's API polish hasn't gotten much attention, it was just codeveloped for a single initial purpose (roundtripping with modifications). Happy to discuss further there and/or take PRs from folks :-)

madsmtm · 2023-09-05T04:02:20Z

A WebAssembly backend would likely be very useful for projects like Ruffle as well

skyne98 · 2024-04-04T05:15:18Z

Hi, thanks for the amazing work! Any plans for this? Really excited for the feature, will allow me to drop wasm-opt as a dependency in lots of my projects 👀 (+ working with cranelift is generally very pleasant!).

@bjorn3

cfallin · 2024-04-04T14:49:30Z

@skyne98 we don't have anyone looking at this right now, and as it's a large engineering project, it's unlikely to be prioritized by the folks who work on Cranelift full-time. That doesn't mean we wouldn't like to have it, just that there are other things that are more important, and time is limited.

As always with an open-source project, I'd be happy to review a PR from anyone who wants this and builds it!

skyne98 · 2024-04-04T14:56:03Z

Of course, thanks for the reply, much appreciated! It seems like a project that requires quite a bit of expertise, based on your comments, so I hope someone can pick it up one day! (Or I get confident enough with it to help out 😅).

One quick question, if you don't mind: does running WASM in embedded wasmtime (with optimizations turned on & cranelift) already fully utilizes cranelift's optimization capability? And, also, is there potentially a way I get yonk the optimized IR out of the pipeline to print it and check it?

cfallin · 2024-04-04T15:56:21Z

Wasmtime does turn on Cranelift's optimizations by default, yes (this can be controlled with -O opt-level=N, 0 for off and 2 for all on. You can get unoptimized IR with wasmtime compile --emit-clif; optimized IR could be printed with a similar option, but you'd need to add similar code around here (and handle the cached case somehow?) using logic similar to this. Or, as a hack, println!() it directly at that point.

skyne98 · 2024-04-15T13:54:41Z

Would have been nice to be able to pass optional callbacks to hook into different parts of the pipeline 👀

bnjbvr added the cranelift Issues related to the Cranelift code generator label Jan 27, 2021

bjorn3 mentioned this issue Sep 26, 2021

Compilation to WASM? rust-lang/miri#722

Open

teymour-aldridge mentioned this issue Jan 4, 2022

Add an ISA builder for Wasm #3640

Closed

bjorn3 mentioned this issue Jan 16, 2022

Building for target different than current OS one rust-lang/rustc_codegen_cranelift#1217

Closed

cfallin mentioned this issue Apr 13, 2022

Question: Cranelift Self Compilation #4020

Closed

cfallin added cranelift:new-target Issues requesting Cranelift support for new targets. enhancement labels May 4, 2022

bjorn3 mentioned this issue Jul 31, 2022

Question: Build compiler to wasm thepowersgang/mrustc#282

Closed

cfallin mentioned this issue Dec 14, 2022

Is there any tool that could transform cranelift IR back to Wasm binaries? #5436

Closed

MartinKavik mentioned this issue Apr 23, 2023

Speed up compilation MoonZoon/MoonZoon#170

Closed

bjorn3 mentioned this issue Jun 21, 2023

rustc.html example not working bjorn3/browser_wasi_shim#30

Closed

MartinKavik mentioned this issue Feb 28, 2024

Optimize compilation MoonZoon/MoonZoon#94

Closed

martinjrobins mentioned this issue Nov 29, 2024

wasm compilation martinjrobins/diffsol#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Webassembly backend #2566

Add Webassembly backend #2566

bjorn3 commented Jan 9, 2021

bjorn3 commented Jan 9, 2021

abbec commented Apr 10, 2021

birbe commented Aug 5, 2021

teymour-aldridge commented Dec 30, 2021 •

edited

Loading

bjorn3 commented Dec 30, 2021

cfallin commented Dec 30, 2021 •

edited

Loading

birbe commented Dec 30, 2021

teymour-aldridge commented Jan 1, 2022 •

edited

Loading

cfallin commented Jan 1, 2022 •

edited

Loading

teymour-aldridge commented Jan 2, 2022

teymour-aldridge commented Jan 6, 2022

cfallin commented Dec 14, 2022

cfallin commented Dec 14, 2022

bjorn3 commented Dec 14, 2022 •

edited

Loading

cfallin commented Dec 14, 2022

madsmtm commented Sep 5, 2023

skyne98 commented Apr 4, 2024

cfallin commented Apr 4, 2024

skyne98 commented Apr 4, 2024

cfallin commented Apr 4, 2024 •

edited

Loading

skyne98 commented Apr 15, 2024

Add Webassembly backend #2566

Add Webassembly backend #2566

Comments

bjorn3 commented Jan 9, 2021

Feature

Benefit

Implementation

Alternatives

bjorn3 commented Jan 9, 2021

abbec commented Apr 10, 2021

birbe commented Aug 5, 2021

teymour-aldridge commented Dec 30, 2021 • edited Loading

bjorn3 commented Dec 30, 2021

cfallin commented Dec 30, 2021 • edited Loading

birbe commented Dec 30, 2021

teymour-aldridge commented Jan 1, 2022 • edited Loading

cfallin commented Jan 1, 2022 • edited Loading

teymour-aldridge commented Jan 2, 2022

teymour-aldridge commented Jan 6, 2022

cfallin commented Dec 14, 2022

cfallin commented Dec 14, 2022

bjorn3 commented Dec 14, 2022 • edited Loading

cfallin commented Dec 14, 2022

madsmtm commented Sep 5, 2023

skyne98 commented Apr 4, 2024

cfallin commented Apr 4, 2024

skyne98 commented Apr 4, 2024

cfallin commented Apr 4, 2024 • edited Loading

skyne98 commented Apr 15, 2024

teymour-aldridge commented Dec 30, 2021 •

edited

Loading

cfallin commented Dec 30, 2021 •

edited

Loading

teymour-aldridge commented Jan 1, 2022 •

edited

Loading

cfallin commented Jan 1, 2022 •

edited

Loading

bjorn3 commented Dec 14, 2022 •

edited

Loading

cfallin commented Apr 4, 2024 •

edited

Loading