Skip to content
Alon Zakai edited this page Aug 14, 2013 · 35 revisions

Debugging

If you compile code and things don't go right, you may have run into a bug in Emscripten, or a limitation of it. This page gives some ideas of how to figure out what is going wrong.

Compilation Failures

If the code doesn't compile at all, it should provide some info in the .js file it creates. To see what line number in the .ll file caused the problem, compile with frameworkLines in settings.js (add it to DEBUG_TAGS_SHOWING), that will print each .ll line as we start to process it.

Also useful is to add framework in settings.js. This lets you know exactly which actor was running during the crash, so you can tell which part of the .ll line was responsible. Search for that actor in the .js files in src/.

Also worth mentioning is that compiling with SpiderMonkey gives slightly more useful crash information than V8.

From here on out, we assume the problem is with running the code, not compiling it.

First Things First

It's a good idea to compare the emscriptened code with how that code behaves when compiled normally. More specifically, compare the following:

  • The emscriptened code.
  • The same source code, compiled into a binary directly, using gcc or clang.
  • The same .bc or .ll file that was emscriptened, run in the LLVM interpreter (lli). .ll files should first be compiled into binary using llvm-as.

If the first gives different results than the other two, then you are hitting either a limitation of Emscripten or a bug. Whereas if the last two don't agree, then something may be wrong with the generation of the .ll, or perhaps a bug in LLVM.

Assuming the last two agree, and differ from the first, then you can proceed to the next step.

Limitation or Bug?

Emscripten can compile almost but not all C/C++ code out there. Some limitations exist, see CodeGuidelinesAndLimitations.

Aside from these limitations, it's possible you ran into a bug in Emscripten, such as:

  • Missing functionality. For example, a library function which hasn't yet been implemented in Emscripten. Possible solutions here are to implement the function (see library.js) or to compile it from C++.
  • An actual mistake in Emscripten. Please report it!

Optimizations

Try to build without optimizations (no -O1 etc.). If that has an effect, you can try to disable LLVM optimizations specifically using --llvm-opts, see emcc --help.

Build with EMCC_DEBUG=2 to get intermediate files for the js optimization phases (output to /tmp/emscripten_temp).

Useful Compilation Settings

As already mentioned, some useful settings appear in src/settings.js. Change the settings there and then recompile the code to have them take effect. When code isn't running properly, you should compile with ASSERTIONS, and if that doesn't clear things up, then SAFE_HEAP. ASSERTIONS adds various runtime checks, and SAFE_HEAP adds even more (slow) memory access checks like dereferencing 0 and memory alignment issues. Additional settings are:

  • EXCEPTION_DEBUG - Will print out exceptions as they occur. This is useful because if the compiled code catches exceptions, it may catch the wrong ones, and/or not give enough details about those exceptions. It's a good idea to enable this if you have any suspicions about something not running properly.
  • LABEL_DEBUG - Will print out each function and each label in each function, as we enter them. This is extremely useful if the generated code enters an infinite loop that it shouldn't: Run it until it hits the loop, then you can see exactly where that is.
  • CORRUPTION_CHECK - Allocates extra memory in each malloc, fills it with canary values. Then later it will check those values, to see if the heap was corrupted. This can help find bugs in cases where the code appears to be behaving incorrectly. See src/corruptionCheck.js for more details.

Inspecting the Generated Code

If you compile with -g the code should be fairly debuggable: Function names are preserved (except for C++ name mangling, use c++filt for those) and to some degree variable names as well. It should be possible to look at the generated code and original source and match them up more or less. This can sometimes be very useful for debugging.

-g is the same as -g3. See emcc --help for more details on the various debugging levels. Another useful one is -g4 which will preserve LLVM debug info, showing filenames and line numbers inline in the generated code (so you can match it up to the original C/C++), and also create a SourceMaps file so that you can see the original sources in the browser debugger.

JS optimizations can strip out debug info or make it less readable. You can try something like

/emcc -O2 --js-opts 0 -g4 tests/hello_world_loop.cpp

which applies only LLVM opts, and basic JS opts but not the JS optimizer, which retains debug info, giving

function _main() {
 var label = 0;
 var $puts=_puts(((8)|0)); //@line 4 "tests/hello_world.c"
 return 1; //@line 5 "tests/hello_world.c"
}

The AutoDebugger

The 'nuclear option' when debugging is to use the autodebugger tool. The autodebugger will rewrite the LLVM bitcode so it prints out each store to memory. You can then run the exact same LLVM bitcode in the LLVM interpreter (lli) and JavaScript, and compare the output (diff is useful if the output is large). For how to use the autodebugger tool, see the autodebug test.

The autodebugger can potentially find any problem in the generated code, so it is strictly more powerful than the CHECK_* settings and SAFE_HEAP. However, it has some limitations:

  • The autodebugger generates a lot of output. Using diff can be very helpful here.
  • The autodebugger doesn't print out pointer values, just simple numerical values. The reason is that pointer values change from run to run, so you can't compare them. However, on the one hand this may miss potential problems, and on the other, a pointer may be converted into an integer and stored, in which case it would be shown but it should be ignored.

AutoDebugger Regression Workflow

Fixing regressions is pretty easy with the autodebugger, using the following workflow:

  • Compile the code using EMCC_DEBUG=1 in the environment. Get the .ll file
  • Run the autodebugger on that .ll file, giving you a new .ll file
  • Compile that .ll file using two versions of emcc, before the regression and after the regression, using EMCC_LEAVE_INPUTS_RAW=1 in the environment. This makes emcc just compile the .ll directly, without linking in dlmalloc, optimizing, etc., so the two emcc versions will process the exact same code
  • Run both versions, saving their output, then do a diff and investigate that. Any difference is likely the bug (other false positives could be things like the time, if something like clock() is called, which differs slightly between runs).

Debug Info

It can be very useful to compile the C/C++ files with -g flag to get debugging into - Emscripten will add source file and line number to each line in the generated code. Note, however, that attempting to interpret code compiled with -g using lli may cause crashes. So you may need to build once without -g for lli, then build again with -g. Or, use tools/exec_llvm.py in Emscripten, which will run lli after cleaning out debug info.

Additional Tips

You can also do something similar to what the autodebugger does, manually - modify the original source code with some printf()s, then compile and run that, to investigate issues.

Another useful tip is if you have a good idea of what line is problematic in generated .js, you can add print(new Error().stack) to get a stack trace there.

Additional Help

Of course, you can also ask the Emscripten devs for help. :) See links to IRC and the Google Group on the main project page.