-
Notifications
You must be signed in to change notification settings - Fork 693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass a start-of-usable-memory to the module. #334
Comments
Zero-based heaps is definitely an interesting optimization, but probably not something we want to provision for in the MVP. It's also possible that, with multi-process future feature (where the basic idea is letting wasm ask for a fresh process and avoiding all the sync call semantics that make this hard in general), that we wouldn't even need to reserve the low pages and so no explicit spec provision would be necessary. So perhaps you could file a PR to add a bullet to FutureFeatures.md#multiprocess-support pointing out that it enables this optimization more cleanly. Also, with separate processes, you'd avoid the "there can be only one" pigeonhole problem. |
I'm not sure I understand what the concrete proposal is. NaCl's sandbox uses this trick extensively, but I'd like to understand what you're proposing that wasm do concretely. |
The proposal is that the runtime define a start-of-usable-memory value and pass this to the wasm code. If possible this would be a compile time constant just in case the code wnts a fixed low layout, but it might be fine as a runtime constant too. The wasm would ideally avoid using memory before this start index - this might just be some default code in emscripten. That's it - no memory protection for low pages defined here etc. What this would do is give the option of a future runtime using these low pages, and a few uses have been mentioned: zero based linear memory on a system that protects the low pages; developer support for detecting zero page accesses in wasm code; speed bounds check code paths (I think as used in Odin x86 now which can take a slow path on low indexes). It is just planning ahead so that deployed code has some flexibility that is anticipated to be needed. |
@lukewagner Even running in a separate process does not always allow the use of low pages. For example, ARM Linux kernels have hard coded protected low page and the only way to explore this on the ARM is to use a hacked kernel. For example, x86/x64 Linux kernels have a system wide restriction which can be changed for testing but might not be practical to demand it changed just for wasm. If memory were not linear and allocated from a larger process address space then I see this would not be an issue, but that seemed a long way off and it seems that wasm will want to support in-process runtimes well going forward too? |
What I proposed in #306 would also provide this functionality. It's proposing that the WASM instance needs to go through the runtime to allocate pages from its address-space. In this case, you could make the runtime only allocate from pages above a certain address. |
@AndrewScheidecker Yes, requesting use of the linear memory on a page level would also support this. I currently only see a need to restrict use of the start pages so this is all that is being requested. |
@JSStats I see, makes sense. In that case, I think that this feature should be opt-in and low addresses should fault. Otherwise, we'll have to simulate the low pages with signal handlers which is both unnecessary complexity and a hidden performance cliff for users. With opt-in + faulting, though, this gives applications the desired low-memory fault (and perhaps we can primarily pitch the feature as such; the zero-based optimization is just a perk on some platforms). While promising, I think this feature doesn't belong in the MVP; now I'm thinking this belongs as in the finger-grained control over memory bucket. |
IIUC your proposal is akin to feature detection in that a developer can query for I see the rationale, but am wondering what happens if a developer addresses below this value? Our orientation so far had been to let the user protect the lower pages as they wished, because a wasm module sees "addresses" (really: heap offset) that aren't actually the process' virtual address. Your proposal assumes that we allow zero-mapping (i.e. wasm heap offset == virtual address), and different kernels forbid us from controlling the lower pages entirely. I'd like a proposal on this topic to also address security concerns: this gives knowledge to an attacker that they wouldn't otherwise have. It's workable (NaCl does it) but needs to be designed properly. |
@jfbastien Yes, runtime feature detection would work fine for this. There is no proposal to do anything different if the wasm code accesses memory below this value - it's just to try and prepare deployed code to be able to support this restriction in future. This is not intended to replace mprotect support, rather to work around system or implementation issues accessing low pages, where even if the low pages are not user protected then access to these low pages might take slow paths. Is the 'security concern' that an attacker would have a good clue that the linear memory is located at zero in the address space and could depend on this in conjunction with other security issues? If people want to bundle this in with runtime enforced protection of low pages then this would be fine with me too - a attacker would then not be able to distinguish if the linear memory is at zero in the address space. |
@jfbastien Actually, this might also want to be a compile-time constant so that the offset can be baked into compiled code. |
Correct, it gives an attacker a lot of information that can be combined with ASLR leaks / non-PIC code / fixed-mapped code (e.g. kernel helpers). It's not insurmountable, but it's something that a design has to work through. |
Some preliminary performance results. Using WAVM, LLVM 3.8, and the zlib benchmark. Using the pointer-masking variant of the benchmark as it is significantly faster and probably has more opportunity to exploit the buffer being at zero. x86: 97.5 % So this might offer a useful performance improvement for ARM. Devices using the ARM cpu are expected to be in need of the best performance and efficiency, and may well want to run a demanding wasm app 'all-in' and be able to allocate the buffer at zero. Note the lower pages are not used. The emscripten GLOBAL_BASE option is used to move the data segment higher and this benchmark ran fine without access to the low pages and ran on an unmodified kernel. |
@JSStats , can you elaborate more on the experiment and the results ? Do we know where the improvement in ARM is coming from (use of faster addressing modes?) ? And why IA is not benefiting as much ? |
@nmostafa Here's the ARM code for the hottest loop. You can see that the limited addressing modes are better used without having to add in the memory base on each access. With memory at absolute zero: Without buffer at zero: LLVM does not always generate great code for asm.js style patterns, so the above might not be optimal, but these are both faster than Odin or V8 by a good margin. Note there are no bounds checks in the above code, it used pointer mask, and the masking is hoisted somewhat, see the 'bfc rx, #28, #4' instructions. IA has more consistent addressing modes, If you want an option to run wasm faster 'all-in' on ARM devices then vote for this feature :) |
I see ~20% code reduction, which I am assuming is the cause of the speedup (lower ICache and/or ITLB misses?), unless there is some ARMv7 uArch detail that favors simpler addressing forms. Are the IA numbers for Core or Atom ? Atom has smaller ICache, and if we are reducing register pressure and saving some register spills/fills, you might see higher gains. |
@nmostafa The ARMv7 has a fixed sized instruction and limited addressing modes. Taking another look at the x86 code shows a potential improvement. If the wavm implicit masking is disabled then runtime improves to 92% with the memory at absolute zero. This might be realistic for wasm32 code running in a 64 bit runtime and using memory protection for access safety, and reserving a large amount of VM to allow a base+index*scale+offset to be known safe. Btw the performance is then better than for using explicit masking for safety. It's not clear if this memory protection scheme will scale to wasm64 code though. The LLVM x64 code generation is currently too poor to draw any conclusions. Looks like it's having problems casting the i32 index to i64 values used in the addressing mode, and emits separate 32 bit lea operations to compute the index. I expect the wasm memory access opcode offset to help a little with this issue. btw this problem is a challenge for v8 at present too. |
Suggest allowing the wasm app to declare it is not using the low page rather than being passed a start-or-usable-memory. This strategy seems a win in many ways, and is amenable to declaring in an externally defined section if there is no consensus among wasm. |
For some runtimes there is the option of placing the linear memory at absolute zero in the address space. This can help on the ARM in particular as it frees up more addressing modes for use, but it also frees up a register for the linear memory base on ARM and x64 (tested on Odin). Some systems already have the bottom pages reserved for security purposes, so the low addresses could not be used with good performance (but perhaps emulated in a signal handler). If a start-of-usable-memory could be passed to a module then it could avoid allocating in this low region.
This might also be useful with an option to protect low pages (which was noted as a developer option elsewhere). Even when the linear memory is not placed at absolute zero, when this developer option is enabled the start-of-usable-memory parameter could advise the application to avoid the low protected pages, and the page size will vary between systems so the start might want to be a parameter and would be more general than a simple flag indicating that the low page is protected.
The text was updated successfully, but these errors were encountered: