Skip to content
Alon Zakai edited this page Apr 8, 2014 · 35 revisions

Debugging

For a quick overview of debugging Emscripten-generated code, see these slides

Limitation or Bug?

Emscripten can compile almost but not all C/C++ code out there. Some limitations exist, see CodeGuidelinesAndLimitations.

Aside from these limitations, it's possible you ran into a bug in Emscripten, such as:

  • Missing functionality. For example, a library function which hasn't yet been implemented in Emscripten. Possible solutions here are to implement the function (see library.js) or to compile it from C++.
  • An actual mistake in Emscripten. Please report it!

Optimizations

Try to build without optimizations (no -O1 etc.). If that has an effect, you can try to disable LLVM optimizations specifically using --llvm-opts 0, see emcc --help.

Build with EMCC_DEBUG=1 to get intermediate files for the compiler's various stages (output to /tmp/emscripten_temp).

Useful Compilation Settings

As already mentioned, some useful settings appear in src/settings.js. Change the settings there and then recompile the code to have them take effect. When code isn't running properly, you should compile with ASSERTIONS, and if that doesn't clear things up, then SAFE_HEAP. ASSERTIONS adds various runtime checks, and SAFE_HEAP adds even more (slow) memory access checks like dereferencing 0 and memory alignment issues.

Function Pointer Issues

If you get an abort() from a function pointer call (nullFunc or b0 or b1 or such, possibly with an error message saying "incorrect function pointer"), the issue is that a function pointer was called but it is invalid in that type.

It is undefined behavior to cast a function pointer to another type and call that (e.g., cast away the last parameter), but this does happen in real-world code, and is one possible cause for this error. In optimized emscripten output, each function pointer type has a different table of entries, so you must call with the correct type to get the right behavior.

Another possible cause is a dereference of 0, like calling a method on a NULL pointer or such. That can be a bug in the code caused by any reason, but shows itself as a function pointer error (as just reading or writing to a NULL pointer will work, unlike in native builds - it is just function pointers that will always fail when NULL).

Use -s ASSERTIONS=2 to get some useful information about the function pointer being called, and its type. Also useful is to look at the stack trace (may want to disable asm.js optimizations in firefox to see the best trace information) to see where in your code the error happens, then see which function should be called but isn't.

SAFE_HEAP is also useful when debugging issues like this.

Inspecting the Generated Code

See the slides linked to before for the -g options.

Another thing you might find useful is to not run JS optimizations, to leave inline source code hints. You can try something like

/emcc -O2 --js-opts 0 -g4 tests/hello_world_loop.cpp

which applies only LLVM opts, and basic JS opts but not the JS optimizer, which retains debug info, giving

function _main() {
 var label = 0;
 var $puts=_puts(((8)|0)); //@line 4 "tests/hello_world.c"
 return 1; //@line 5 "tests/hello_world.c"
}

Debugging Emscripten Issues

If you think you may have hit an Emscripten codegen bug, there are a few tools to help you.

The AutoDebugger

The 'nuclear option' when debugging is to use the autodebugger tool. The autodebugger will rewrite the LLVM bitcode so it prints out each store to memory. You can then run the exact same LLVM bitcode in the LLVM interpreter (lli) and JavaScript, and compare the output (diff is useful if the output is large). For how to use the autodebugger tool, see the autodebug test.

The autodebugger can potentially find any problem in the generated code, so it is strictly more powerful than the CHECK_* settings and SAFE_HEAP. However, it has some limitations:

  • The autodebugger generates a lot of output. Using diff can be very helpful here.
  • The autodebugger doesn't print out pointer values, just simple numerical values. The reason is that pointer values change from run to run, so you can't compare them. However, on the one hand this may miss potential problems, and on the other, a pointer may be converted into an integer and stored, in which case it would be shown but it should be ignored. (You can modify this, look in tools/autodebugger.py.)

One use of the autodebugger is to quickly emit lots of logging output. You can then take a look and see if something weird pops up. Another use is for regressions, see below.

AutoDebugger Regression Workflow

Fixing regressions is pretty easy with the autodebugger, using the following workflow:

  • Compile the code using EMCC_AUTODEBUG=1 in the environment.
  • Compile the code using EMCC_AUTODEBUG=1 in the environment, again, but with a difference emcc setting etc., so that you now have one build before the regression and one after.
  • Run both versions, saving their output, then do a diff and investigate that. Any difference is likely the bug (other false positives could be things like the time, if something like clock() is called, which differs slightly between runs).

(You can also make the second build a native one using the llvm nativizer tool mentioned above - run it on the autodebugged .ll file, which EMCC_DEBUG=1 will emit in /tmp/emscripten_temp. This helps find bugs in general and not just regressions, but has the same issues with the nativizer tool mentioned earlier.)

Debug Info

It can be very useful to compile the C/C++ files with -g flag to get debugging into - Emscripten will add source file and line number to each line in the generated code. Note, however, that attempting to interpret code compiled with -g using lli may cause crashes. So you may need to build once without -g for lli, then build again with -g. Or, use tools/exec_llvm.py in Emscripten, which will run lli after cleaning out debug info.

Additional Tips

You can also do something similar to what the autodebugger does, manually - modify the original source code with some printf()s, then compile and run that, to investigate issues.

Another useful tip is if you have a good idea of what line is problematic in generated .js, you can add print(new Error().stack) to get a stack trace there. There is also stackTrace() which emits a stack trace and also tries to demangle C++ function names.

Additional Help

Of course, you can also ask the Emscripten devs for help. :) See links to IRC and the Google Group on the main project page.