This simple benchmark tests whether a point is contained by the small scale Natural Earth 1:110m land dataset for a set of 1000 random points (point-in-polygon test). The test is implemented in various languages (JS, Python, C, Rust) and libraries (GEOS, Turf.js, geo).
After discussing the results with the GEOS developers (libgeos/geos#1024), the performance issue got included in a list of issues regarding the GEOS Relate
algorithm (libgeos/geos#1060).
The GEOS devs reworked the underlying algorithm (see locationtech/jts#1052), which is now substantially faster than before! Here are the updated results:
Implementation of point-in-polygon test (1000 points x 127 polys) |
Time (ms, lower is better) |
Time last benchmark (ms, lower is better) |
---|---|---|
GEOS-WASM (JS) | 6 | 2074 |
Shapely (Python) | 1692* | 1814 |
GEOS (C-API) | 1 | 1784 |
Turf.js (JS)] | 21 | 29 |
geo (Rust) | 3 | 3 |
* Latest available Shapely build 2.1.0.dev0+140.gac708af includes GEOS 3.12.2 and therefore doesn't show the performance gains from locationtech/jts#1052 yet.
Benchmark was performed using GEOS 3.13.0 (C-API and JS-WASM), Turf.js v7.1.0 and geo v0.29.2 on a 14" MacBook Pro M1. Previous benchmark used GEOS 3.12.1, Turf.js v6.5.0 and geo 0.27.0 on the same machine.
Implementation of point-in-polygon test |
Time (ms, lower is better) |
---|---|
GEOS-WASM (JS) | 2074 |
Shapely (Python) | 1814 |
GEOS (C-API) | 1784 |
Turf.js (JS) | 29 |
geo (Rust) | 3 |
(Tested on a 2021 14" MacBook Pro M1)
While working on the GEOS-WASM package, I was wondering how the performance of the WebAssembly build compares to Shapely (Python).
Shapely links to the native GEOS library using the C-API and allows to use a vectorized ufunc interface to improve performance. The GEOS-WASM package on the other hand is a pure WebAssembly build of GEOS using Emscripten, which is called from JS without any additional optimisations.
As a (first) simple benchmark, I chose a point-in-polygon test that I found in this blog post by Guido Rice. The test is simple: a set of 1000 random points is tested against the small scale Natural Earth 1:110m land dataset. Note that the test only measures the time to perform the point-in-polygon test, not the time to load the dataset and create the geometries.
Running this test, I find GEOS-WASM to be taking around ~2074 ms to complete on my machine, which is about 17% slower than native GEOS (1784 ms), and about 15% slower than the vectorised Shapely implementation (1814 ms). That's quite good, considering that other benchmarks found WebAssembly to be 50%-250% slower than native code (see here and here)!
Apart from optimizing the Emscripten build, I don't see any obvious performance improvements that could be applied to the JS side of things though. One could argue that vectorizing the code, aka moving the loops from JS to WASM, would improve performance. However, since most JS engines JIT compile and optimize hot loops, the performance gain is likely to be marginal compared to doing the same thing in (standard) interpreted Python (see here for more info on recent JIT developments in V8).
Since GEOS-WASM is targeted for use in web browsers, I was also wondering how its performance compares to the de-facto standard web geospatial library Turf.js. While at it, I also gave GeoRust a shot. GeoRust is an up and coming Rust library which can be compiled to WebAssembly as well (see geoarrow-rs by Kyle Barron), and using it has been on my todo list for a while now!
As you can see, Rusts geo is super fast performing this task (not that I hadn't been told so before 🙂)! What suprises me most about the results is that Turfs booleanContains
seems to complete the task almost 62x faster than GEOSPreparedContains from the C-API, though.
I tried to look through the GEOS source code, but so far can't really tell which algorithm is used for the point-in-polygon test or what might be a bottleneck for this particular test case. From a first glance, it seems that in the contains
method GEOS internally creates an IntersectionMatrix, which is then used to determine if it fulfills the "contains" criteria (see isContains()
). There's an issue in the PDAL library that discusses the performance of the GEOS contains
method, however the suggested solution (using GEOSPreparedContainsXY
instead of GEOSPreparedContains
) didn't seem to improve the performance in my case. Also, it feels kind of strange to hit such a bottleneck with this relatively small test (127 polygons against 1000 points)...
Turf.js uses the point-in-polygon-hao package for the actual point-in-polygon test, which is based on this paper. It's important to note that there are some open issues with Turfs booleanContains
implementation though, e.g. returning wrong results in some edge cases.
GeoRust seems to be using the Winding number algorithm as noted in the source code. It seems there are no open issues with regards to the contains
method, yet GeoRust is still a relatively young project and not as battle tested as GEOS with its widespread use in PostGIS and Shapely.
Closing this, I'd like to note that this benchmark is by no means a comprehensive comparison of the various libraries. It's just a simple test that I ran out of curiosity. I'm certainly happy that the WebAssembly build of GEOS performs better than expected, yet I'll probably open an issue in the GEOS repo to see if there's anything that can be done to improve the performance of the contains
method for this kind of test!
If you have any comments or suggestions, please feel free to open an issue or PR, or send me a mail!
- Python > 3.7
- Bun >= 1.0.0
- Rust >= 1.31.0
- CLANG >= 7.0.0
- Install dependencies (only needed once)
make install
- Run the benchmark
make run
If you want to run the benchmark with different datasets, you can adapt the relevant path strings in the benchmark.mjs
or run the single implementations directly. The datasets are passed as arguments as shown in the following examples:
# JS -> GEOS-WASM
bun ./out/contains.mjs ./data/1000_random_points.geojson ./data/ne_110m_land.geojson
# C -> GEOS
./out/contains.outc ./data/10000_random_points.geojson ./data/ne_10m_land.geojson
# Python -> Shapely
python ./out/contains.py ./data/10000_random_points.geojson ./data/ne_10m_land.geojson
# Rust -> geo
./out/contains.outrs ./data/10000_random_points.geojson ./data/ne_10m_land.geojson
# JS -> Turf.js
bun ./out/contains.turf.mjs ./data/10000_random_points.geojson ./data/ne_10m_land.geojson