-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Linking
Emscripten supports linking object files - containing LLVM bitcode - statically. This lets most build systems work with Emscripten with little or no changes (see Building Projects).
In addition, Emscripten as of 1.32.2
has support for dynamic linking of JavaScript modules, described below.
Before we get to dynamic linking, let's talk about static linking. Emscripten's linking model is a little different than most native platforms. To understand it, consider that native linking models work in a setting where the following facts are true:
- The application runs directly on the local system, and has access to local system libraries, like C and C++ standard libraries, and others.
- Code size is not a big concern. In part this is because the system libraries already exist on the system, so "hello world" in C++ can be small, even if it uses a large amount of iostream code in the C++ standard library. But also, code size is perhaps a matter that influences cold startup times, in that more code takes longer to load from disk, but the cost is general not significant, and modern OSes mitigate it in various ways, like caching apps they expect to be loaded.
In Emscripten's case, code is typically going to run on the web. That means the following:
- The application is running in a sandbox. It has no local system libraries to dynamically link to; it must ship its own system library code.
- Code size is a major concern, as the application's code is being downloaded over the internet, which is many orders of magnitude slower than an installed native app on one's local machine.
For that reason, Emscripten's "object files" are simply LLVM bitcode. That bitcode has all the high-level information to perform efficient dead code elimination, especially for a standalone app, which is what we have. In other words, you statically link in the C standard library, and we strip out the parts (most of it!) that you don't actually use. Emscripten also automatically handles system libraries for you, in order to do the best possible job it can at getting them small.
A downside to this approach is that it means we have focused less on dynamic linking. Emscripten has experimented with it in several ways, and it has been hard to find a way to perform dynamic linking that is both fast and generates code that can run fast, given the web environent, how JavaScript works, and how asm.js is optimizable (in particular, asm.js function pointers cannot be shared with the outside, which prevents dynamic linking).
Emscripten does support dynamic linking, but it comes with a cost, of code running more slowly. It might be just a little slower, or it might be a 2x slowdown, if you do a lot of cross-module calls. As a result, dynamic linking is only recommended for
- Fast iteration times during development. Build your app to several libraries, only rebuild the one you just modified. This avoids recompiling the entire world each time.
- Applications where performance doesn't matter much (fast enough anyhow, or bound by something else, e.g. WebGL).
Emscripten's dynamic linking is fairly simple: you build several separate code "modules" containing JavaScript, and can link them at runtime. The linking basically connects up the unresolved symbols in each module with the implemented symbols in the others, in the simplest of ways. It does not currently support nuances like weak symbols or corner cases of linkonce
semantics. It should work fine on "simple" code, that is, code not using fancy features from C/C++ extensions that affect linking. As mentioned earlier, dynamic linking just hooks up an unresolved symbol to an implementation of it in another module, on a first come first served manner.
System libraries do utilize some more advanced linking features. For that reason, Emscripten tries to simplify the problem as follows: There are two types of shared modules:
- Main modules, which have system libraries linked in.
- Side modules, which do not have system libraries linked in.
A project should contain one main module. It can then be linked at runtime to multiple side modules. This model also makes other things simplier, like only the singleton main module has the general JavaScript enviroment setup code to connect to the web page and so forth; side modules contain just the pure compiled LLVM bitcode and nothing more.
The one tricky aspect to this design is that a side module might need a system library that the main doesn't know about. See the section on system libraries, below, for how to handle that.
Note that the "main module" doesn't need to contain the main()
function. It could just as easily be in a side module. What makes the main module the "main" module is just that there is only one main module, and only it has system libs linked in.
(Note that system libraries are linked in to the main module statically. We still have some optimizations from doing it that way, even if we can't dead code eliminate as well as we'd like.)
If you want to jump to see running code, you can look in the test suite. There are test_dylink_*
tests that test general dynamic linking, and test_dlfcn_*
tests that test dlopen()
specifically. Otherwise, we describe the procedure now.
- Build one part of your code as the main module, using
-s MAIN_MODULE=1
. (You hopefully don't need to, but may be required to do something for system libraries here, see later below.) - Build other parts of your code as side modules, using
-s SIDE_MODULE=1
.
Note that both should have suffix .js
, as they contain JavaScript (emcc
uses suffixes to know what to emit). If you want, you can then rename the side modules to .so
or such (but it is just a superficial change.)
You then need to tell the main module to load the sides. You can do that using the Module
object, with something like
Module.dynamicLibraries = ['libsomething.js'];
At runtime, when you run the main module, if it sees dynamicLibraries
on Module
, then it loads them one by one and links them. The running application then can access code from any of the modules linked together.
dlopen()
is slightly simpler than general dynamic linking. The procedure begins in the same way, with the same flags used to build the main and side modules. The difference is that you do not use Module.dynamicLibraries
; instead, you must load the side module into the filesystem, so that dlopen
(or fopen
, etc.) can access it. That's basically it - you can then use dlopen(), dlsym()
, etc. normally.
As mentioned earlier, system libraries are handled in a special way by the Emscripten linker, and in dynamic linking, only the main module is linked against system libraries. A possible issue is if a side module needs a system library that the main does not. If so, you'll get a runtime error. This section explains what to do to fix that.
To get around this, you can build the main module with EMCC_FORCE_STDLIBS=1
in the environment to force inclusion of all standard libs. A more refined approach is to build the side module with -v
in order to see which system libs are actually needed - look for including lib[...]
messages - and then building the main module with something like EMCC_FORCE_STDLIBS=libcxx,libcxxabi
(if you need those two libs).
The dynamic linking implementation utilizes some dynamic aspects of JavaScript, to keep things simple.
The cornerstone of Emscripten's dynamic linking is the use of emulated function pointers. You don't need to care about this detail, but if you do, it means that function tables are implemented outside of asm.js code. That means that function tables are just normal JavaScript arrays, and can be adjusted at runtime - which is necessary for dynamic linking.
The downside is that asm.js is very good at optimizing its internal function tables, and we lose that benefit. We must also call out of asm.js for each indirect call (i.e., anything that goes through a function pointer, like a virtual method call), or any symbol provided by another module. This incurs runtime overhead.
It might be interesting to experiment with compiling you code (both main and side modules) with -s ASM_JS=2
. That disables the "use asm"
string from being emitted, which means that browsers that detect it (which currently includes 3 of the 4 major browsers) will optimize differently. It is possible that if you do a great many cross-module calls, that not declaring the code as asm.js will make it run faster. However, if you do mostly pure computation inside of a module, this will probably make things slower. Experimenting and measuring on multiple browsers is a good idea.
Native linkers generally only run code when all symbols are resolved. Emscripten's dynamic linker hooks up symbols to unresolved references to those symbols dynamically. As a result, we don't check if any symbols remain unresolved, and code can start to run even if there are. It will run successfully if they are not called in practice. If they are, you will get a runtime error. What went wrong should be clear from the stack trace (in an unminified build); building with -s ASSERTIONS=1
can help some more.
As a simple result from how it is implemented, Emscripten's dynamic linker can perform general dynamic linking - not just dlopen
- at runtime! For example, you can write this in your C code:
EM_ASM({
Runtime.loadDynamicLibrary('sideModule.js');
});
That will load and link a side module, entirely at runtime. If your module uses symbols that are resolved in that side module, they will be accessible. Note that you probably shouldn't depend on this feature, but it might be useful.
- A known limitation is that while functions work fine, global variables that are linked might not. We link globals through function calls, and try to call them rarely - once per basic block. If you link and call the symbol within the same basic block, bad stuff might happen.
README.md ``