Fault on heap accesses around 0 #204

jfbastien · 2015-06-19T09:29:32Z

@titzer suggested that "a module may optionally define that out-of-bounds includes small indices close to 0". I like the idea, and I think we should figure out the details of how this works. C/C++ certainly care about this features, but other languages may not so it may be interesting to let the .wasm file control this.

Should WebAssembly implementations be allowed to do something, and be required to document what they do?
Should it be a property in the .wasm file?
Something we specify through ELF?
Should it be done manually through mprotect, by the developer? Note that toolchains could do this automatically when generating _start.

In all cases, should we mandate that implementations provide information on smallest page size, so that this can be efficiently implemented through memory protection?

The text was updated successfully, but these errors were encountered:

lukewagner · 2015-06-19T14:26:23Z

This is something C++ devs (myself included) really want. I agree that many uses may not want this so I've been assuming it would be a module-level opt-in flag.

I think the ideal is we make this part of the semantics, but currently I don't see a strategy that has full performance if the engine isn't able to use memory-protect+signal-handling tricks to guard the low memory region. (The best I can think of requires an extra sub before the bounds check.) Assuming full performance w/o signal handlers is a tier-1 use case, then I think this feature belongs better as a developer tool. It would still be nice to have the module flag, though, so that a browser's devtools could have a persistent "fault on low-memory access" checkbox that a developer would leave on and so they would always be catching these bugs, not just when they chose to open the Web console.

jfbastien · 2015-06-19T14:35:19Z

I like the mprotect approach :-)

lukewagner · 2015-06-19T14:54:30Z

Do you mean in the engine impl or exposing mprotect as a primitive to wasm? If the latter and the engine didn't have signal handler support, it seems like we'd need the engine to maintain a shadow memory (1 bit or byte per page) that was probed before every access.

jfbastien · 2015-06-19T14:58:24Z

I mean 4. that I wrote above. I don't think we need shadow memory for it: we can limit what parameters we allow to mprotect for now, and define accesses to non read or write memory to trap. When we add signal handling we can do more, in this case raise the segfault.

lukewagner · 2015-06-19T15:04:28Z

@jfbastien Right, but how would that be implemented in the engine without either signal-handler support or some shadow memory bookkeeping?

That being said, most platforms do seem to have pretty good signal support; I'm mostly recalling a concern of @pizlonator with making JSC embeddable. Filip: have you considered catching faults at the Mach exception layer? That's what we had to do to catch SIGSEGVs before breakpad killed the process; specifically, listening to the thread-level exception ports lets you get first dibs before all the Unix signals run and my impression from the general lack of documentation is that few devs use Mach exceptions directly.

jfbastien · 2015-06-19T15:08:17Z

Hmm, the polyfill may be an issue!

I also hadn't thought about engines which can't just use signals... That'll bite them once wasm does support signals, so maybe it's fine to address this issue now?

titzer · 2015-06-19T19:00:00Z

On Fri, Jun 19, 2015 at 2:29 AM, JF Bastien [email protected]
wrote:

@titzer https://github.com/titzer suggested that "a module may
optionally define that out-of-bounds includes small indices close to 0".
I like the idea, and I think we should figure out the details of how this
works. C/C++ certainly care about this features, but other languages may
not so it may be interesting to let the .wasm file control this.

To be fair, I just transferred a sentence from below to this paragraph for
readability.

Should WebAssembly implementations be allowed to do something, and
be required to document what they do?

Should it be a property in the .wasm file?

Something we specify through ELF?

Should it be done manually through mprotect, by the developer? Note
that toolchains could do this automatically when generating _start.

Another possibility: what if we just check the base address for 0 (null)
first?

We could combine all memory accesses into one kind of addressing mode that
takes 3 integer parameters like so:

load/store base, offset, disp

Where "base" is an int32/int64 expression, "offset" is a int32 literal, and
"disp" is an int32/int64 expression. The effective address would be
calculated as "intW_add(base, intW_add(offset, disp))", while the bounds
check would first check base != 0 and then "integer_add(intW_add(base,
disp), offset) < heap_size", with the latter potentially achieving the
bounds check hoisting Luke wanted.

I didn't think through the details of whether the two checks could be
combined into one machine instruction, but clearly the "null check" of
"base != 0" will often be loop invariant and thus hoistable.

This requires that all heap accesses have three operands, which is OK if we
have macro encoding for common cases of 0 offset or displacement.

In all cases, should we mandate that implementations provide information
on smallest page size, so that this can be efficiently implemented through
memory protection?

—
Reply to this email directly or view it on GitHub
#204.

pizlonator · 2015-06-19T19:18:02Z

On Jun 19, 2015, at 8:04 AM, Luke Wagner [email protected] wrote:

@jfbastien https://github.com/jfbastien Right, but how would that be implemented without either signal-handler support or some shadow memory bookkeeping?

That being said, most platforms do seem to have pretty good signal support; I'm mostly recalling a concern of @pizlonator https://github.com/pizlonator with making JSC embeddable. Filip: have you considered catching faults at the Mach exception layer?

Yes. But I also care about this not being too awful in embedding scenarios on Linux and Windows, since WebKit and JSC do run on those.

It’s also a philosophical concern. I prefer for webasm to be specified in such a way that you can create a performant implementation without relying on the OS.

That's what we had to do http://hg.mozilla.org/mozilla-central/file/2694ff2ace6a/js/src/asmjs/AsmJSSignalHandlers.cpp#l1040 to catch SIGSEGVs before breakpad killed the process; specifically, listening to the thread-level exception ports lets you get first dibs before all the Unix signals run and my impression from the general lack of documentation is that few devs use Mach exceptions directly.

Yeah, the way to program with Mach is to try to find sample code and then just sort of do what that does. ;-) It’s true that this is what we would do, if webasm required us to do it.

-Filip

pizlonator · 2015-06-19T19:23:22Z

On Jun 19, 2015, at 7:26 AM, Luke Wagner [email protected] wrote:

This is something C++ devs (myself included) really want. I agree that many uses may not want this so I've been assuming it would be a module-level opt-in flag.

I think the ideal is we make this part of the semantics, but currently I don't see a strategy that has full performance if the engine isn't able to use memory-protect+signal-handling tricks to guard the low memory region. (The best I can think of requires an extra sub before the bounds check.)

Right. And, interestingly, if you have to do:

if (unsigned(ptr - C1) >= C2 - C1)

Then the performance ends up being roughly the same as doing:

if (ptr < C1 || ptr >= C2)

Because the subtract is one cycle, but so is a second compare/branch that the CPU can fuse. Either way, adding one cycle is probably not great.

Assuming full performance w/o signal handlers is a tier-1 use case, then I think this feature belongs better as a developer tool.

I agree.
It would still be nice to have the module flag, though, so that a browser's devtools could have a persistent "fault on low-memory access" checkbox that a developer would leave on and so they would always be catching these bugs, not just when they chose to open the Web console.

—
Reply to this email directly or view it on GitHub #204 (comment).

pizlonator · 2015-06-19T19:27:32Z

On Jun 19, 2015, at 12:00 PM, titzer [email protected] wrote:

On Fri, Jun 19, 2015 at 2:29 AM, JF Bastien [email protected]
wrote:

@titzer https://github.com/titzer suggested that "a module may
optionally define that out-of-bounds includes small indices close to 0".
I like the idea, and I think we should figure out the details of how this
works. C/C++ certainly care about this features, but other languages may
not so it may be interesting to let the .wasm file control this.

To be fair, I just transferred a sentence from below to this paragraph for
readability.

Should WebAssembly implementations be allowed to do something, and
be required to document what they do?

Should it be a property in the .wasm file?

Something we specify through ELF?

Should it be done manually through mprotect, by the developer? Note
that toolchains could do this automatically when generating _start.

Another possibility: what if we just check the base address for 0 (null)
first?

We could combine all memory accesses into one kind of addressing mode that
takes 3 integer parameters like so:

load/store base, offset, disp

Where "base" is an int32/int64 expression, "offset" is a int32 literal, and
"disp" is an int32/int64 expression. The effective address would be
calculated as "intW_add(base, intW_add(offset, disp))", while the bounds
check would first check base != 0 and then "integer_add(intW_add(base,
disp), offset) < heap_size", with the latter potentially achieving the
bounds check hoisting Luke wanted.

I didn't think through the details of whether the two checks could be
combined into one machine instruction, but clearly the "null check" of
"base != 0" will often be loop invariant and thus hoistable.

That’s true, but this significantly complicates things just for faulting on zero. Faulting on zero feels too much like a developer feature.

This requires that all heap accesses have three operands, which is OK if we
have macro encoding for common cases of 0 offset or displacement.

But then we risk some wasm generators abusing the 0-offset encoding and using normal math to compute offsets.

In all cases, should we mandate that implementations provide information
on smallest page size, so that this can be efficiently implemented through
memory protection?

—
Reply to this email directly or view it on GitHub
#204.

—
Reply to this email directly or view it on GitHub #204 (comment).

sunfishcode · 2015-10-22T21:05:18Z

Since we're planning on having mprotect, that functionality should suffice for this feature.

jfbastien · 2015-10-22T21:09:32Z

So the consensus is that MVP will not fault on nullptr deref?

ghost · 2015-10-22T21:12:20Z

The difference between this issue and mprotect might be that this issue deals with a restriction enforced by the runtime, not protection setup by the wasm code itself. Operation systems often impose similar restrictions on applications. This issue might also help support placing the linear memory at absolute zero in the address space which has been shown to have significant performance benefits.

ghost · 2015-10-22T21:34:44Z

I'd like to see the runtime being able to pass a start_of_usable_memory and for any access below this to optionally fault on MVP. If only to future proof MVP targeted applications. I believe this would address this issue too.

sunfishcode · 2015-10-22T21:36:17Z

@jfbastien Once we have mprotect, that'll be able to do exactly what we want here, so it's desirable to avoid adding a separate mechanism that will then become obsolete.

@JSStats - Moving linear memory to absolute zero in the virtual address space can be discussed in #334.

sunfishcode · 2024-12-06T15:56:00Z

This feature is now being worked on in the memory-control proposal.

jfbastien added the enhancement label Jun 19, 2015

jfbastien added this to the MVP milestone Jun 19, 2015

jfbastien mentioned this issue Jun 19, 2015

Update AstSemantics.md #200

Merged

jfbastien referenced this issue Jun 22, 2015

Update AstSemantics.md

2702652

sunfishcode modified the milestones: Future Features, MVP Oct 22, 2015

lukewagner mentioned this issue Oct 28, 2015

Move low-memory faulting to FutureFeatures.md #433

Merged

sunfishcode removed the enhancement label Jul 12, 2016

puzzleddev mentioned this issue Mar 7, 2020

allowzero pointers don't work on wasm32-freestanding in release mode. ziglang/zig#4668

Closed

sunfishcode added the linear memory label Feb 24, 2024

sunfishcode removed this from the Future Features milestone Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fault on heap accesses around 0 #204

Fault on heap accesses around 0 #204

jfbastien commented Jun 19, 2015

lukewagner commented Jun 19, 2015

jfbastien commented Jun 19, 2015

lukewagner commented Jun 19, 2015

jfbastien commented Jun 19, 2015

lukewagner commented Jun 19, 2015

jfbastien commented Jun 19, 2015

titzer commented Jun 19, 2015

pizlonator commented Jun 19, 2015

pizlonator commented Jun 19, 2015

pizlonator commented Jun 19, 2015

sunfishcode commented Oct 22, 2015

jfbastien commented Oct 22, 2015

ghost commented Oct 22, 2015

ghost commented Oct 22, 2015

sunfishcode commented Oct 22, 2015

sunfishcode commented Dec 6, 2024

Fault on heap accesses around 0 #204

Fault on heap accesses around 0 #204

Comments

jfbastien commented Jun 19, 2015

lukewagner commented Jun 19, 2015

jfbastien commented Jun 19, 2015

lukewagner commented Jun 19, 2015

jfbastien commented Jun 19, 2015

lukewagner commented Jun 19, 2015

jfbastien commented Jun 19, 2015

titzer commented Jun 19, 2015

pizlonator commented Jun 19, 2015

pizlonator commented Jun 19, 2015

pizlonator commented Jun 19, 2015

sunfishcode commented Oct 22, 2015

jfbastien commented Oct 22, 2015

ghost commented Oct 22, 2015

ghost commented Oct 22, 2015

sunfishcode commented Oct 22, 2015

sunfishcode commented Dec 6, 2024