Skip to content
Alon Zakai edited this page Dec 24, 2013 · 46 revisions

Building Projects

The Tutorial shows how emcc, the drop-in replacement for gcc, can be used to compile single files very easily into JavaScript. Building large projects with Emscripten is also very simple: You basically use emcc instead of gcc in your makefiles. This can usually be done by setting CC to emcc, or with a flag to configure, but it can be even easier than that: For example, if you normally build with

   ./configure
   make

then the process with Emscripten looks like

   emconfigure ./configure
   emmake make
   emcc [-Ox] project.bc -o project.js

where project.bc is the linked bitcode that was generated by make, so change that to the name generated by the project (note that the output bitcode might have suffix .o or .so depending on the details of the build system).

  • The first change is to run emconfigure, with the normal configure command as an argument. emconfigure runs configure but tells it to use emcc instead of gcc, and a few other useful things (for details, see the docs inside emcc). Similarly, emmake does some helpful environment var settings and so forth (typically if you use configure or cmake or such, you don't need emmake - all the info is in the configure-generated files - but if not, emmake will set default env vars for the compiler to point to emscripten and so forth).
  • The second change is, once the project is built, to add a command to convert the compiled project into JavaScript: emcc is run on the compiled project bitcode, and told to generate JavaScript output (we will see later down why [-Ox] appears there). This additional command is necessary for two main reasons: first, because emcc, when called from the makefile, will not automatically generate JavaScript during linking (if it did, there would be a lot of JavaScript generated in intermediary steps in many projects, which is unnecessary and inefficient to link and so forth), and second, because we have various options and optimizations that must apply to the entire program being compiled (we cannot compile file A with options X and file B with options Y and link them into one program - they can literally have different memory structures, for example, different typed array modes etc., so therefore all these options and optimizations are done on the final conversion from bitcode to JavaScript). So, when called from the makefile emcc will generate bitcode. A single line, as shown above, then converts the bitcode into JavaScript.
  • In other words, a conventional native code build system will generate native code in object files as an intermediate form, while building with Emscripten uses LLVM bitcode as an intermediate form.
  • In general you don't need to care about this, except for needing one extra line for the last transformation to JavaScript. However, one potentially confusing situation can occur with optimization, see the subsection below.
  • Note that the output of the build system can be a static library (.a), shared library (.so) or just object files (.o or .bc). In all of these cases using emcc in the build system will cause these files to contain LLVM bitcode, even though the suffix looks the same as if gcc ran. emcc can then be used to compile the .a, .so, .o or .bc file into JavaScript.

Optimizations

As mentioned above, the intermediate format is LLVM bitcode - object files contain that and not JavaScript. We do optimize like a normal compiler does, however: Each source file is optimized by LLVM as it is compiled into an object file. And, we perform additional optimizations when converting object files into JavaScript (optimizations that make sense only for JavaScript).

As a consequence, you should provide the same optimization flags both when compiling source to object and object to JavaScript (or HTML). For example

# bad!
emcc -O2 a.cpp -o a.bc
emcc -O2 b.cpp -o b.bc
emcc a.bc b.bc -o project.js

will not generate the best results, because no JS-level optimizations were performed. Likewise,

# also bad!
emcc a.cpp -o a.bc
emcc b.cpp -o b.bc
emcc -O2 a.bc b.bc -o project.js

will also not be optimal, because no LLVM optimizations are done. The proper way would be to do

# good!
emcc -O2 a.cpp -o a.bc
emcc -O2 b.cpp -o b.bc
emcc -O2 a.bc b.bc -o project.js

By passing the same optimization flags during all stages, code will be properly optimized. Note that this goes not just for -O2 and -O1 but also for things like -s OPTION=VALUE which can affect optimization. Again, just pass the same flags during all compilation stages.

  • You can control whether LLVM optimizations are run using --llvm-opts N where N is in 0-3. Sending -O2 --llvm-opts 0 to emcc during all compilation stages will disable LLVM optimizations but utilize JS optimizations. This can be useful when debugging a build failure.

Notes

  • It is better to generate .so files and not .a. Archives (.a) have some odd behaviors when linked with other files, the linker tries to be 'clever' and discard stuff it thinks is not needed. Shared libraries (.so) are simpler, and we do elimination of unneeded code later anyhow, so they are recommended. This is generally a simple change in your project's build system.
  • Make sure to use bitcode-aware llvm-ar instead of ar. ar may discard code.
  • If you get multiply defined symbol errors, try --remove-duplicates in emcc. This tries to emulate ld's permissive behavior that llvm-link lacks.

Manually Using emcc

As a drop-in replacement for gcc, emcc can be used in all the normal ways you would expect:

    emcc src.cpp
    # Generates a.out.js from C++. Can also take as input .ll (LLVM assembly) or .bc (LLVM bitcode)

    emcc src.cpp -c
    # Generates src.o containing LLVM bitcode.

    emcc src.cpp -o result.js
    # Generates result.js containing JavaScript.

    emcc src.cpp -o result.bc
    # Generates result.bc containing LLVM bitcode (the suffix matters).

    emcc src1.cpp src2.cpp
    # Generates a.out.js from two C++ sources.

    emcc src1.cpp src2.cpp -c
    # Generates src1.o and src2.o, containing LLVM bitcode

    emcc src1.o src2.o
    # Combine two LLVM bitcode files into a.out.js

    emcc src1.o src2.o -o combined.o
    # Combine two LLVM bitcode files into another LLVM bitcode file

For more on emcc's capabilites, do emcc --help (it can also optimize, change parameters to how Emscripten generates code, generate HTML instead of JavaScript, etc.).

System Libraries

An sdl-config replacement is present in system/bin. Pointing configure scripts to system or system/bin should get them to use SDL properly.

Using Libraries

If your project needs a standard system library, like for example zlib or glib, then if there is not built-in support in emscripten for it, you will need to link it in manually. Built-in support exists for libc, libc++ and SDL, and for those you do not even need to add -lSDL or such - they will just work. But for other libraries, you need to build and link them.

  • To build them, you would build them normally using emcc. Build them into bitcode, not JavaScript - which is easier, basically just run make using emcc as described above, and do not do anything additional to generate JavaScript from the bitcode.
  • In your main project, as mentioned earlier in this document you need to add a command to go from bitcode to JavaScript. You should tell that command to also link in the library you built into bitcode. For example, if you built libstuff.bc, and your final build command was emcc project.bc -o final.html, then you should write emcc project.bc libstuff.bc -o final.html. (Alternatively, you could use llvm-link to link the library with your other bitcode, etc.)

Issues

Build System Self-Execution

Some large projects, as part of their build procedure, generate executables and run them in order to generate input for later parts of the build system (for example, a parser may be built and then run on a grammar, which generates C/C++ code that implements that grammar). This is a problem when cross-compiling, including with Emscripten, since you cannot directly run the code you are generating.

The simplest solution is usually to build the project twice: Once natively, and once to JavaScript. When the JavaScript build procedure then fails on not being able to run a generated executable, you then copy that executable from the native build, and continue to build normally. This works for Python, for example (for more details, see tests/python/readme.txt).

Another possible solution that makes sense in some cases is to modify the build scripts so that they build the generated executable natively. For example, this can be done by specifying two compilers in the build scripts, emcc and gcc, and using gcc just for generated executables. However, this can be more complicated than the previous solution because you need to modify the project build scripts, and also you need to work around cases where code is compiled and used both for the final result and for a generated executable (so you need to make sure it is built both natively and for JS).

Dynamic Linking

Emscripten's goal is to generate the fastest and smallest possible code, and for that reason it focuses on generating a single JavaScript file for an entire project. It is possible to link files at runtime (see Linking), but it isn't recommended.

Linking in libraries

Since emscripten does not have true dynamic linking - we won't link in code from some system location as we load an app - we approximate it to the best of our abilities. When you specify a dynamic library in a call to emcc, then it will be linked in as a static library, when you are building the final "executable", that is, JS or HTML file. However, if you are linking together to bitcode, then dynamic libraries are ignored. The reason is you could link them twice to two libraries, then link those together. This works natively since actual linking will occur during startup, but for us, we use static linking, so had we linked them in, we would get an error on duplicate symbols.

The solution is to specify dynamic libraries once, in the command that builds to JS or HTML. It's ok if you specify them elsewhere as well, but they will be ignored; the important thing is to not forget them during the final build stage.

Configure

If your project uses configure, cmake or some other portable configuration method, it may do a lot of checks during the configure phase. emcc tries to get those to pass as much as it can, but in general it may not succeed. If you encounter such a case, you may need to disable checks in configure. Often the checks are just to verify that things will work, but things will actually work even though the checks fail.

If configure does checks that help determine important paths etc. for later in the build system, you may need to manually add those paths later and so forth.

Note that in general something like configure is not a good match for a cross-compiler like Emscripten. configure works very hard to get code to build natively for whatever local setup you have. With a cross-compiler, you are ignoring the native build system and the local system headers, and instead targeting a single standard target, so just writing out the values relevant for that target makes sense.

Alternatives to emcc

You can in theory call clang, llvm-ld, etc. yourself. However, not using emcc is dangerous. One reason is that emcc will use the Emscripten bundled headers, while using Clang by itself will not, by default. This can lead to various errors. Also, using things like llvm-ld will result in unsafe/unportable LLVM optimizations being done by default. When you use emcc, it automatically handles all of that for you so that things work properly.

Examples

You can see how the large tests in tests/runner.py are built - the C/C++ projects there are built using their normal build systems, using emcc as detailed on this page. Specifically, the large tests include: freetype, openjpeg, zlib, bullet and poppler.

Also worth looking at the build scripts in the following projects, although several are not yet updated to use the new emcc tool: