Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm2c's wasm-rt.h documentation #1949

Closed
douglas-raillard-arm opened this issue Jul 20, 2022 · 5 comments · Fixed by #1960
Closed

wasm2c's wasm-rt.h documentation #1949

douglas-raillard-arm opened this issue Jul 20, 2022 · 5 comments · Fixed by #1960

Comments

@douglas-raillard-arm
Copy link

wasm-rt.h is not fully documented on the following points, which is an issue when evaluating wasm2c for embedded targets (or any project that cannot rely on wasm-rt-impl.c):

  • What headers generated code and wasm-rt.h are depending on ? E.g. currently wasm-rt.h depends on #include <setjmp.h>. While standard, embedded targets such as linux kernel modules do not provide such facilities.

  • What will the generated code require from wasm-rt.h exactly ? The doc lists some of them, but the actual header declares extra functions such as wasm_rt_set_unwind_target()

  • Backward compatibility promise (or lack of) for all of the above. If tomorrow wasm2c generated code starts depending on new headers/functions, that can break a whole workflow without hope of repair. In the linux kernel module case, anything that would require an arch-specific implementation (e.g setjmp.h) is out of scope for most projects as the maintenance burden would be way to high.

Just to be clear, I'm not saying that wasm2c should promise any of the above if it's not possible to (e.g. future evolutions of wasm/wasm2c capabilities) but having a statement either way would help people evaluating the use of wasm2c for a project.

@sbc100
Copy link
Member

sbc100 commented Jul 20, 2022

Sounds reasonable that we should at least document this.

For (1): Perhaps we should phrase it as "these are the compilers and environments where wasm2c is know build/run". Currently the would be mac, linux and windows and building with clang, gcc, msvc. Try to target any other OS/environment or use any other compiler and you may face porting efforts. Obviously if you want to add another environment "the linux kernel" that would be great!

For (2): I guess this is just a matter of fleshing out the current README

For (3): I think all we can /should do is to build and test in our supported targets/environments. If you are using an unsupported environment it would be up you to deal with any changes introduced by new wasm2c versions (or pin an older version until the new version is ported). As we make changes to wasm2c we would work to ensure that that new code works on all the know/supported targets, but I don't think we can make guarantees about targets we don't know about.

@douglas-raillard-arm
Copy link
Author

douglas-raillard-arm commented Jul 20, 2022

Thanks for the quick answer :)

Obviously if you want to add another environment "the linux kernel" that would be great!

I would be quite happy to do that provided that my experiment succeeds and that I can actually use wasm2c for what I need. The actual use case is:

  • run wasm in an out-of-tree kernel module using wasm2c
  • the wasm is coming from Rust toolchain
  • the wasm code is purely "computational", i.e. it access very little to no kernel API
  • the rest of the module is "normal" and is responsible for calling the wasm entry point, i.e. it's hand-written C with the classic Kbuild setup (e.g. https://tldp.org/LDP/lkmpg/2.6/html/x121.html)

The reason I was investigating wasm2c rather than just making rustc output a static archive and linking to the module is to be able to build the module without a rust compiler. This is because the said module is compiled "dynamically" by a Python environment, and I'd like to avoid breaking users flow by requiring a recent rust toolchain:
https://lisa-linux-integrated-system-analysis.readthedocs.io/en/master/setup.html#building-a-module

The challenges are:

  • This is a freestanding environment, i.e. one cannot expect the standard library to be present.
  • In practice, part of the standard library are usable but:
    • headers will have a different name: <linux/string.h> instead of <string.h>
    • not all functions are available
    • some features are available but with a different name and different signature: malloc() => kmalloc(), strtoul() => kstrtoul()
  • Some things can be done but require precautions: kernel_fpu_begin()/kernel_fpu_end() around floating point operations. We can probably leave that to the calling C code.
  • Some things cannot be done: setjmp()/longjmp() are not available, and it's not reasonable to implement them due to the number of supported architectures, and probably other issues as well.

Most of these problems can be solved if wasm2c's generated code calls/includes a user-implementable helper instead of libc functions directly, but the last one seems like a blocker. Either:

  • there is a way to make the rust toolchain guarantee to not emit exception code (unlikely)
  • wasm2c give up setjmp()/longjmp() and uses another scheme. One common way to emulate exceptions is using code like that:
// Function returns type T

struct bar_res {
    bool excep;
    T value;
};

struct bar_res bar() {
   if (1) 
	    // "raise" an exception
	    return ((struct bar_res){.excep = true});
    else
	    // return a value in the normal path
	    return ((struct bar_res){.value = 42});
}

struct foo_res {
    bool excep;
    T2 value;
};

struct foo_res foo() {
    struct bar_res x = bar();
    if (x.excep) 
	    // "re-raise" the exception, with the right type
	    return ((struct foo_res){.excep = true});

    T _x = x.value;

    // proceed as usual 
}

Since C does not have generics or templates, this requires generating a "struct res" type for each possible return/exception types combination. An extra annoyance is that C does not have any usable zero-sized type that could be used to annihilate the struct overhead in case the function does not raise any excep. Either we use a small type like char or we skip the struct altogether, which gives a different "calling convention".
The advantage of that is that unlike with setjmp()/longjmp(), there is no need to make local variables volatile for things to behave as expected.

Another simpler solution if there is no threading or coroutines is to just have a global "exception" variable and return plain values. The check can be elided for functions known not to raise.

When it comes to CI, building a hello world kernel module for the booted kernel is pretty easy on most distro so that should not add a lot of burden. We just need to enable enough warnings as errors to ensure we don't call any function without prototype. Otherwise the module will link successfully but fail to load because of missing symbols (which we probably won't test as it requires root and is likely forbidden on container-based CIs)

EDIT: fix code formatting

@keithw
Copy link
Member

keithw commented Jul 20, 2022

My understanding is that the runtime files included in the wasm2c directory (https://github.com/WebAssembly/wabt/tree/main/wasm2c), i.e. wasm-rt.h, wasm-rt-impl.h, and wasm-rt-impl.c, are meant to be taken as an example runtime that demonstrates how to host the wasm2c output on a basically-POSIX or Win32 user environment.

E.g., wasm2c's generated output itself doesn't depend on setjmp/longjmp/malloc/free or #include <setjmp.h> or <string.h>; it's only the example runtime that uses those symbols (setjmp/longjmp are used for trapping, and recently also for exceptions). If you want to use the generated output in a freestanding environment, you'll probably have to provide a different runtime, but I don't think it needs to diverge dramatically from the example. Especially if you don't care about supporting Wasm exceptions and have your own preferred way to allocate memory and deal with a Wasm trap.

I agree we should keep the README up-to-date about what the wasm2c-generated output requires from the runtime, which I agree is nonobvious today (e.g. if exceptions are enabled, right now the wasm2c output expects the runtime to define something called a "jmp_buf") but I think what we want to be documenting here, and the spirit of the existing README, is the interface between the wasm2c output and the runtime, not the interface between the included runtime (wasm-rt.h, wasm-rt-impl, etc.) and the OS.

@douglas-raillard-arm
Copy link
Author

So after a bit more investigation:

  • I am now relatively confident rustc will not start using exceptions without my consent. This comment suggests that exceptions will be one day used for panic=unwind behavior, but rust allows panic=abort (which is the current wasm default), which as the name suggests simply traps: Drop::drop() not called on a panic! when compiling to WASM rust-lang/rust#58874 (comment) . I don't think panic=abort on wasm will start using wasm exceptions.
    If it did start to use exceptions it would be problematic though, as there is simply no pure C implementation that can provide the feature required, other than calling the standard's setjmp()/longjmp().

  • wasm2c-generated sources still include standard headers such as <math.h> and <string.h>, but I guess I can sed my way to rewrite them, provided the functions they are expected to provide are available.

  • I stumbled upon this thread: Add some optional defines to disable wasm2c trap checks #1432 . For my use case, it would obviously make sense since the wasm code is actually more trusted than the embedder's code, (trusted source + comes from Rust => 99% of the memory accesses have already been proven valid by the compiler). But:

    • I recognize the value of having wasm2c being as compliant to the spec as possible rather than handwaving "details" away
    • It turns out the implementer should be able to elide the checks (both memory and call stack depth) if they want to, purely based on the TRAP() implementation, keeping the interface clean and free of "facilities to do the wrong thing for the wrong reasons".

@sbc100
Copy link
Member

sbc100 commented Jul 21, 2022

  • wasm2c-generated sources still include standard headers such as <math.h> and <string.h>, but I guess I can sed my way to rewrite them, provided the functions they are expected to provide are available.

Once you get it working you would wrap those includes in ifdef KERNEL so some such and provide and alternative. I doubt we depend on many symbols from those headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants