Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance studies #8

Open
staltz opened this issue Nov 26, 2021 · 11 comments
Open

Performance studies #8

staltz opened this issue Nov 26, 2021 · 11 comments

Comments

@staltz
Copy link
Owner

staltz commented Nov 26, 2021

As of commit bb5b98f built with ReleaseSmall:

operation ops/ms
bipf.encode 1.8511662347278786
bipf-native.encode 0.13290096220296635
bipf.decode 7.326007326007326
bipf-native.decode 2.0977554017201596
JSON.stringify 3.6724201248622843
JSON.parse 5.1894135962636225
JSON.parse(buffer) 5.574136008918618
JSON.stringify(JSON.parse()) 3.134796238244514
bipf.seek(string) 1250
bipf-native.seek(string) 98.03921568627452

cc @jerive

@staltz
Copy link
Owner Author

staltz commented Nov 26, 2021

I ran Valgrind on bipf-native running perf.js on the encode part only, and then I ran Valgrind on bipf running the same perf.js, here's what I found (screenshot from KCachegrind, sorted by Called):

bipf-native

Screenshot from 2021-11-26 18-14-57@2x

bipf

Screenshot from 2021-11-26 18-16-05@2x

Notes

What I think is going on is that napi probably does a v8 TryCatch, and try-catches are known to be bad for performance if you would write them by hand in JS. The other alternative is implementing bipf-native with v8 APIs directly, maybe that way we can take shortcuts.

@staltz
Copy link
Owner Author

staltz commented Nov 26, 2021

@jerive I think we should try implementing encodingLength and encode using V8 APIs directly and check how the performance looks like. We have no use for bipf-native if it's slow.

@jerive
Copy link
Contributor

jerive commented Nov 26, 2021

Sure it's what I was afraid of.
I'm currently trying to include the V8 headers...

@jerive
Copy link
Contributor

jerive commented Nov 26, 2021

I will certainly need support here
zig does not find some things that are part of cpp stdlib.

// v8.h
#include <atomic>
#include <memory>
#include <string>
#include <type_traits>
#include <utility>
#include <vector>

it should be supported (ziglang/zig#4786) but no clue how.
I will leave it here for tonight.

@staltz
Copy link
Owner Author

staltz commented Nov 26, 2021

I will give it a try too, tomorrow

@jerive
Copy link
Contributor

jerive commented Nov 27, 2021

On ubuntu, the header files are in the package libc++-11-dev.
Now how to link them.

@staltz
Copy link
Owner Author

staltz commented Nov 27, 2021

Can you try -lc++ ?

@staltz
Copy link
Owner Author

staltz commented Nov 27, 2021

See this issue in Node.js nodejs/node#14379 (comment)

Seems like the performance drawbacks of N-API were acknowledged, but "compared the results to the original module that used V8 APIs ... overhead of N-API is fairly minimal already". I'm concerned what does that "minimal overhead" mean. If converting bipf-native to V8 APIs would only speed up by 2x compared to N-API bipf-native, that's not good enough. We would need it to be 20x to make bipf-native V8 at least 2x faster than bipf.

@dominictarr
Copy link

I'll bet you it's the overhead of the js-native api. especially since you are doing many tiny calls.
The thing that originally lead me to creating bipf was realizing that grep could process the ssb log in less than a second, but json parse took several minutes.

I would be interested to see how fast it went if you ran the benchmark inside zig, you wouldn't be able to do encode but it would do seek, etc. I'd also be curious how it went with wasm. However, you'd have to copy each record into wasm to be able to run bipf on it. What would be faster is better is to copy the whole block into wasm memory and then scan many bipf records without leaving wasm. However, in practical terms that means reimplementing flume/jitdb in wasm.

@staltz
Copy link
Owner Author

staltz commented Dec 3, 2021

Yes, @dominictarr, it smells like a bridging overhead indeed. What I'm wondering is why is there a copy in the first place. There's no FFI. How can JSON.stringify/parse operate without copying, and if we have access to V8 APIs, couldn't we get as close as JSON's status?

@staltz
Copy link
Owner Author

staltz commented Dec 3, 2021

Yeah, so we tried to implement BIPF in C++ first using N-API and then V8 APIs, and it's a bit hopeless: jerive/bipf-napi#1

TL;DR is that the problem isn't that the bridge between C++ and JS is slow, the problem is that pure JS code is usually optimized just in time to machine code and that's highly efficient. In our benchmarks in the issue linked, the bottleneck was literally just V8::Number::New(). Bottomline is that JS is fast, and the way of being competitive is by replacing large or all parts of the program with C++ or other low-level programming languages. Replacing small parts doesn't seem to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants