Skip to content
Alon Zakai edited this page Feb 18, 2014 · 35 revisions

Debugging

If you compile code and things don't go right, you may have run into a bug in Emscripten, or a limitation of it. This page gives some ideas of how to figure out what is going wrong.

First Things First

It's a good idea to compare the emscriptened code with how that code behaves when compiled normally. More specifically, compare the output of the following:

  • The emscripten-generated JS code.
  • The same source code, compiled into a native binary directly, using gcc or clang.
  • Optionally, you can run the same .bc or .ll file that was emscriptened in the LLVM interpreter (lli). This does not always work, and you should probably only try this if you know low-level LLVM and C stuff well. If you want to try it, use tools/llvm-nativizier.py.

If the first gives different results than the other two, then you are hitting either a limitation of Emscripten or a bug. Whereas if the last two don't agree, then something may be wrong with the generation of the .ll, or perhaps a bug in LLVM.

Assuming the last two agree, and differ from the first, then you can proceed to the next step.

Limitation or Bug?

Emscripten can compile almost but not all C/C++ code out there. Some limitations exist, see CodeGuidelinesAndLimitations.

Aside from these limitations, it's possible you ran into a bug in Emscripten, such as:

  • Missing functionality. For example, a library function which hasn't yet been implemented in Emscripten. Possible solutions here are to implement the function (see library.js) or to compile it from C++.
  • An actual mistake in Emscripten. Please report it!

Optimizations

Try to build without optimizations (no -O1 etc.). If that has an effect, you can try to disable LLVM optimizations specifically using --llvm-opts 0, see emcc --help.

Build with EMCC_DEBUG=2 to get intermediate files for the js optimization phases (output to /tmp/emscripten_temp).

Useful Compilation Settings

As already mentioned, some useful settings appear in src/settings.js. Change the settings there and then recompile the code to have them take effect. When code isn't running properly, you should compile with ASSERTIONS, and if that doesn't clear things up, then SAFE_HEAP. ASSERTIONS adds various runtime checks, and SAFE_HEAP adds even more (slow) memory access checks like dereferencing 0 and memory alignment issues. Additional settings that might help are:

  • ASSERTIONS also has additional levels. Try -s ASSERTIONS=2 for even more costly runtime checks.
  • EXCEPTION_DEBUG - Will print out exceptions as they occur. This is useful because if the compiled code catches exceptions, it may catch the wrong ones, and/or not give enough details about those exceptions. It's a good idea to enable this if you have any suspicions about something not running properly.
  • LABEL_DEBUG - Will print out each function and each label in each function, as we enter them. This is extremely useful if the generated code enters an infinite loop that it shouldn't: Run it until it hits the loop, then you can see exactly where that is.
  • CORRUPTION_CHECK - Allocates extra memory in each malloc, fills it with canary values. Then later it will check those values, to see if the heap was corrupted. This can help find bugs in cases where the code appears to be behaving incorrectly. See src/corruptionCheck.js for more details.

Inspecting the Generated Code

If you compile with -g the code should be fairly debuggable: Function names are preserved (except for C++ name mangling, use c++filt for those) and to some degree variable names as well. It should be possible to look at the generated code and original source and match them up more or less. This can sometimes be very useful for debugging.

-g is the same as -g3. See emcc --help for more details on the various debugging levels. Another useful one is -g4 which will preserve LLVM debug info, showing filenames and line numbers inline in the generated code (so you can match it up to the original C/C++), and also create a SourceMaps file so that you can see the original sources in the browser debugger.

JS optimizations can strip out debug info or make it less readable. You can try something like

/emcc -O2 --js-opts 0 -g4 tests/hello_world_loop.cpp

which applies only LLVM opts, and basic JS opts but not the JS optimizer, which retains debug info, giving

function _main() {
 var label = 0;
 var $puts=_puts(((8)|0)); //@line 4 "tests/hello_world.c"
 return 1; //@line 5 "tests/hello_world.c"
}

The AutoDebugger

The 'nuclear option' when debugging is to use the autodebugger tool. The autodebugger will rewrite the LLVM bitcode so it prints out each store to memory. You can then run the exact same LLVM bitcode in the LLVM interpreter (lli) and JavaScript, and compare the output (diff is useful if the output is large). For how to use the autodebugger tool, see the autodebug test.

The autodebugger can potentially find any problem in the generated code, so it is strictly more powerful than the CHECK_* settings and SAFE_HEAP. However, it has some limitations:

  • The autodebugger generates a lot of output. Using diff can be very helpful here.
  • The autodebugger doesn't print out pointer values, just simple numerical values. The reason is that pointer values change from run to run, so you can't compare them. However, on the one hand this may miss potential problems, and on the other, a pointer may be converted into an integer and stored, in which case it would be shown but it should be ignored. (You can modify this, look in tools/autodebugger.py.)

One use of the autodebugger is to quickly emit lots of logging output. You can then take a look and see if something weird pops up. Another use is for regressions, see below.

AutoDebugger Regression Workflow

Fixing regressions is pretty easy with the autodebugger, using the following workflow:

  • Compile the code using EMCC_AUTODEBUG=1 in the environment.
  • Compile the code using EMCC_AUTODEBUG=1 in the environment, again, but with a difference emcc setting etc., so that you now have one build before the regression and one after.
  • Run both versions, saving their output, then do a diff and investigate that. Any difference is likely the bug (other false positives could be things like the time, if something like clock() is called, which differs slightly between runs).

(You can also make the second build a native one using the llvm nativizer tool mentioned above - run it on the autodebugged .ll file, which EMCC_DEBUG=1 will emit in /tmp/emscripten_temp. This helps find bugs in general and not just regressions, but has the same issues with the nativizer tool mentioned earlier.)

Debug Info

It can be very useful to compile the C/C++ files with -g flag to get debugging into - Emscripten will add source file and line number to each line in the generated code. Note, however, that attempting to interpret code compiled with -g using lli may cause crashes. So you may need to build once without -g for lli, then build again with -g. Or, use tools/exec_llvm.py in Emscripten, which will run lli after cleaning out debug info.

Additional Tips

You can also do something similar to what the autodebugger does, manually - modify the original source code with some printf()s, then compile and run that, to investigate issues.

Another useful tip is if you have a good idea of what line is problematic in generated .js, you can add print(new Error().stack) to get a stack trace there. There is also stackTrace() which emits a stack trace and also tries to demangle C++ function names.

Additional Help

Of course, you can also ask the Emscripten devs for help. :) See links to IRC and the Google Group on the main project page.