Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: add experimental benchmark module #50768

Closed
wants to merge 3 commits into from

Conversation

RafaelGSS
Copy link
Member

@RafaelGSS RafaelGSS commented Nov 17, 2023

This PR introduces a new built-in module to Node.js: node:benchmark. Following the reasoning of node:test this module should only be available with the node: prefix. I know we’ve discussed adding a built-in benchmark module in Node.js a couple of times, so I read all the issues I could find and will try to summarise why we believe we should have it on Node.js.

For context, a benchmark module was attempted to be included on Node.js core on #44974, however, that pull request was basically exposing our internal benchmark tool to the userland code. While I agree with all the objections made in that PR - mainly the ones saying our internal benchmark tool wasn’t supposed to be used to compare different functions, but different node.js binaries - I believe Node.js must have a built-in module where people can rely on the results and make real assumptions in their software.

Problem

The fundamental problem is that there isn't a reliable benchmark tool for Node.js users to rely on currently. Before creating this PR, I conducted research with Vinicius, comparing different benchmark packages while conducting the same experiment. The results observed showed some inconsistencies and, because they use different approaches to measure the benchmark, the results are not comparable to each other. This PR is still in draft while Vinicius and I finish this little research.

Node.js and its dependencies update quickly, and it can be challenging to keep up with recent and sometimes undocumented changes - mainly in the package scope. While performance has always been important to the project, it has recently become a hot topic. Having a tool or module that users can use to monitor regressions in their applications would fill a gap that I believe Node.js currently lacks.

The benchmark module

I developed the benchmark module along with @H4ad and our objective is to make it quite simple and extensible, for this experimental moment - you will find the roadmap we’ve planned for this at the end of this section.

As you can see in the documentation, the usage is pretty simple and easily portable from benchmark.js:

const { Suite } = require('node:benchmark');

const suite = new Suite('Deleting vs Assigning undefined');

suite.add('Using delete to remove property from object', () => {
  const data = { x: 1, y: 2, z: 3 }
  delete data.y

  data.x
  data.y
  data.z
});

suite.add('Using undefined to "remove" property from object', () => {
  const data = { x: 1, y: 2, z: 3 }
  data.y = undefined

  data.x
  data.y
  data.z
});

suite.run()

This PR is still a draft since we’re working on the following TODO and checking different approaches to guarantee the reliability of our results. We’ve decided to open it earlier to discuss strategies and see if someone is strongly against adding this built-in module.

TODO:

  • Use worker_threads to guarantee a clean environment on each benchmark
  • Finish the research against other benchmark packages
  • Add more tests
  • Finish documentation

ROADMAP:

These are the features we believe would make sense to include in a second or third iteration of this PR:

  • Support for AbortController
  • Group
  • Accept and monitor different optimization strategies using available flags
    • --trace-opt
    • --trace-deopt
    • --expose-internals + %OptimizeOnNextCall()

FAQ

I’ve presented an initial draft to some Node.js collaborators at NodeConfEU and some of them raised a few concerns, including:

  • Why it should be on Node.js core instead of a node.js official package, such as undici?

Nothing prevents this code from being in a package instead of Node.js built-in. However, this goes against the main reason we’ve created this PR: We’d like to standardise the way people benchmark nodejs functions, mainly, to prevent misinformation or cherry-picked data. I really believe, that if we make it a module, we won’t achieve that, people won’t use it.

  • Why not node --bench my-benchmark.js instead of a built-in module (require(’node:benchmark’)?

One of the objectives of this module is to be easily integrated into any custom regression checker tool. Users can simply require it to quickly set up a benchmark whenever needed by turning it into a module.

  • Would that replace our usage of common.createBenchmark()?

That’s not the main objective of this PR. As previously said, our internal benchmark tool works pretty well (in my opinion) in our environment. However, it doesn’t prevent us from extending the benchmark module in a feasible way so we can internally use it as well.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders
  • @nodejs/startup
  • @nodejs/web-infra

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Nov 17, 2023
@joyeecheung
Copy link
Member

joyeecheung commented Nov 17, 2023

The fundamental problem is that there isn't a reliable benchmark tool for Node.js users to rely on currently.

How do we know that this tool is better than the ones already out there in the wild?

From a glance of the code this seems perfectly fine to be released first as an npm package. We can at least do that first and experiment in the wild for a while, make sure that it's mature enough before bringing it into core. This also allows independent versioning and makes it available on older Node.js release lines without too many complications. We have done this in the past with node-report and maybe a few other modules that I am forgetting.

I really believe, that if we make it a module, we won’t achieve that, people won’t use it.

I don' think this is a strong enough argument to not release this as an npm package first. Or at least it feels weird to assert this without even trying it to release this as an npm package first. Developing a module from scratch in core to replace alternatives already popular in the wild because we claim to do it better (without even verifying with user feedback) seems to be a dangerous path to go down with, unless we have very good arguments e.g. it's already a standard and has compliance test suites, or it has to rely on certain things in core & avoid monkey-patching. Without such arguments, I think it's overall not a good message for the ecosystem - it means the work you put into an npm package can become obsolete just because Node.js releases an official alternative and claims to do it better without user verification first, even though the alternative does not need access to any Node.js internals/implement any widely accepted standards and looks perfectly okay to be another npm pacakage and play a fair game.

Co-authored-by: Vinícius Lourenço <[email protected]>
@RafaelGSS RafaelGSS force-pushed the feat/add-node-benchmark branch from 9356d0d to 523ff0e Compare November 17, 2023 15:45
@RafaelGSS
Copy link
Member Author

@joyeecheung I will elaborate on that soon as I get some time!

@ljharb
Copy link
Member

ljharb commented Nov 17, 2023

Could we not pick another name rather than restricting it to the node: protocol?

validateNumber(timeInNs, 'timeInNs', 0);

if (timeInNs > 1e9) {
return `${NumberPrototypeToFixed(timeInNs / 1e9, 2)}s`;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need primordials in benchmark runner?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use of primordials hasn't reached any consensus yet, until then, we preferred to use it just because all the rest of code on node uses it.

And the usage of primordial in these cases didn't impact the benchmark performance, so is safe to use them.

lib/internal/benchmark/histogram.js Outdated Show resolved Hide resolved
@lemire
Copy link
Member

lemire commented Nov 17, 2023

@RafaelGSS It might be possible to add CPU performance counters (when they are available). E.g., when running in privileged mode. That would require some C++, evidently.

@RafaelGSS
Copy link
Member Author

I've just updated the PR description as an attempt to make it more clear


How do we know that this tool is better than the ones already out there in the wild?

So, that's why this PR is still a draft. We're evaluating the assertiveness of this new module. However, I'd not say "better" than the ones out there. Some might be wrong, but some just use a different approach to calculate the metrics. The main objective of this module is to standardise the nodejs benchmark results out there.

From a glance of the code this seems perfectly fine to be released first as an npm package. We can at least do that first and experiment in the wild for a while, make sure that it's mature enough before bringing it into core.
I don' think this is a strong enough argument to not release this as an npm package first. Or at least it feels weird to assert this without even trying it to release this as an npm package first. Developing a module from scratch in core to replace alternatives already popular in the wild because we claim to do it better (without even verifying with user feedback) seems to be a dangerous path to go down with, unless we have very good arguments

Right, so I'll try to explain that. I genuinely don't understand how a new package would solve the issue described in the PR description. How is it different as adding node:benchmark as an experimental module? I'm almost certain this would fall in the same situation as node:test.

Using benchmark.js as an example, which is widely used in the nodejs ecosystem (even though, it was meant for jsPerf). It doesn't appear to be maintained anymore, their test suite includes only EOL nodejs versions and it doesn't provide any guarantee the benchmark results are still reliable.

I strongly believe the community wants this module and I definitely don't want to sound like we're claiming to do things better, but I believe this isn't a good point of objection here. The goal is always to improve the ecosystem and having a built-in sounds like a good path to go IMO.

Tagging @nodejs/tsc just for visibility (no need to add to the agenda).

@ronag
Copy link
Member

ronag commented Nov 18, 2023

If we added node:test I don't understand why we wouldn't add this? How is this different?

@ljharb
Copy link
Member

ljharb commented Nov 18, 2023

@ronag i don't think it's universally agreed that adding node:test was a good idea either. For testing, there simply is no majority agreed upon way to author or run tests, and as such, there was/is no clear pattern to follow when building node's module, and as such, node:test can feel like a mismatch to many.

For benchmarking, is there a majority agreed upon way to do them? In terms of output format, JS API, techniques, etc? If there is, then I'd expect a userland module to follow them, and then a core module would make sense when it can do something better than the userland module (which includes, being built in). If there isn't, then it seems premature for node to try to figure out what the best patterns are.

@ronag
Copy link
Member

ronag commented Nov 18, 2023

I don't think it was a good idea to add node:test. But I also think it's not a good idea to be inconsistent.

@ljharb
Copy link
Member

ljharb commented Nov 18, 2023

Repeating one bad decision isn't the same thing as consistency :-)

@joyeecheung
Copy link
Member

joyeecheung commented Nov 18, 2023

The main objective of this module is to standardise the nodejs benchmark results out there.

I would say a built-in benchmark module for utility is at least worth consideration, but a built-in benchmark module for standardizing anything for the Node.js ecosystem doesn't seem right. Node.js isn't a standardization venue. If we want to standardize anything, the design should be done elsewhere and be stabilized elsewhere before Node.js starts incoorporating it in the releases.

I genuinely don't understand how a new package would solve the issue described in the PR description. How is it different as adding node:benchmark as an experimental module?

I think we should rather ask which one of these issues cannot be solved if it's a npm package. As far as I can tell the only difference is that being in core gives it a bit of free marketing but even then I doubt how much difference that makes - we have many later-added built-in features that users still aren't aware of. An npm package can still get way better marketing than a Node.js builtin when there's enough effort put into it.

On the other hand developing a totally user-land package in core from scratch has many downsides: more burden on the CI, more noises in the issue tracker, coupled versioning and backports needed for LTS, etc. I would say for any new built-in that can be implemented entirely in the user land, we should just do it as an npm package first, and only bring it in when it stabilizes to reduce all those startup overhead.

I'm almost certain this would fall in the same situation as node:test.

And we had a lot of reliability flakes, CI burden and issue tracker noises from node:test. Historically we've been doing the "npm package first, bring it in when it stabilizes" mode for many features (e.g. node-report, node-inspect, fetch). The test runner was an outlier and I think the way it was developed was a mistake that we should not repeat. There are some bits of the test runner modules that should be done in core, yes, like the mocking parts. But the runner and reporter really don't need to be in core. Even if we want to bring them to core for utility reasons, we should not have developed them from scratch in core.

I strongly believe the community wants this module

The community wants good benchmarking tools, I am sure, and I am not against adding built-in benchmarking tools. I am against developing a benchmarking tool from scratch in core instead of experimenting it as an independently versioned npm package with its own issue tracker & CI first. Good benchmarking tools need proper designs, and landing a premature design can actually be harmful - for example the current API of this module already encourages microbenchmarking and gives the smell of the SunSpider problem (one of the doc example even looks like something straight out of SunSpider). It took browsers years to realize that SunSpider was a mistake and misled the JavaScript performance work. We should not repeat the same mistake on the server side. The API also rings a bell for the CrankshaftScript we had that regressed in the Turbofan migration and I think in 2023 we should've learned something from these mistakes and work on something that prevents microbenchmarking instead. Adding a premature benchmarking tool that encourages microbenchmarking can actually do more harm than good, we need to keep that in mind.

The current API isn't really useful for checking different implementation of your own software either (unless it's a very simple library that can be copy-pasted and you are only writing throw-away benchmarks so you don't care much about copy-pasting), rather it's mostly useful to check different ways of using your dependency, which I think is actually less common/helpful than the former (i.e. what our common. createBenchmark serves). Normally you don't keep two different implementation of the same API around in the same checkout that can be loaded into the same process. Rather you usually have two different checkouts that you want to check using the same workload and they can usually be run in different processes (which is actually better because benchmarking two different things in the same JavaScript context is already subject to some of the JS engine quirks e.g. GC schedule, inline cache states, existing type feedbacks etc. and therefore unreliable). Investigating into which way of using your dependency may be a fun research, but most of the times it's the real world use case that matter, and for that typically the first thing you care about is benchmarking how your own software perform under some real world workload, not how to use your dependency faster before you even confirm that your dependency is your bottleneck. I am sure this is not the only problem with the current API, so there are even more reasons to stablizes it elsewhere (and ideally, we need verification from V8 to avoid running into the CrankshaftScript problem again)

@joyeecheung
Copy link
Member

I don't think it was a good idea to add node:test. But I also think it's not a good idea to be inconsistent.

We've been consistently doing the "npm package first, bring them in when they are stable" practice for at least fetch, node-inspect, and node-report. node:test was the one being inconsistent.

@isaacs
Copy link
Contributor

isaacs commented Nov 19, 2023

I really believe, that if we make it a module, we won’t achieve that, people won’t use it.

If you are wrong, then it should be published as an npm module first, and pulled into core once validated and standardized.

If you are right, then it shouldn't exist at all.

If it's not better (according to actual user choice) than the examples on npm, and there's no consistency among the examples on npm, then that means that pulling this into core will hinder the exploration of the solution space, and prevent an optimal solution (or multiple optimal solutions for various different situations) from being discovered. Or, it will go unused and become a wart on the node api.

This does not belong in core in my opinion.

@ovflowd
Copy link
Member

ovflowd commented Nov 19, 2023

I share @ljharb and @isaacs opinions here. We just do not have data to make such claims.

When thinking on introducing new APIs I always think that following TC39's route is a very interesting path of testing adoption of new experimental modules.

  • We start drafting a spec
  • We debate about the spec until we get somewhere
  • We share and promote the spec to userland
  • We start making polyfills (in our case a npm package)
  • Observe adoption, feedback and likeness of being used
  • Take notes

This is the sort of API change that needs time to mature. If adoption in the course of 6months-1year doesn't happen even after shared efforts of promoting it, then it indeed has no place in core.

The rushed attitude we have at times of adding things on core without properly analysing and testing and doing research over long periods of time is extremely dangerous. On an unrelated note, it is what right now shakes me regarding the navigator.* api changes that we're introducing on Node... But that is a different topic.

Some of the node:test changes got released so quickly that it feels we didn't give enough time for more teams and the community to properly provide feedback (this is the sort of thing for what the Next 10 team exists for)... I love the "native test runner" but I wonder if the way it was released was appropriate.

This is the same reasoning onto why, this shouldn't land, at least now. Let's take a deep breath, some steps back and course correct here. We can definitely benefit from a benchmarking api, but we really need to design that well, validate it, and ensure that this is something that users would really benefit on from belonging on core vs being a downstream library.

Because nothing stops us from just making an officially supported library, like undici. Which sadly still struggles with adoption...

@ovflowd
Copy link
Member

ovflowd commented Nov 19, 2023

BTW, also wanted to make it clear that I appreciate the work you've put together here, @RafaelGSS :)

@RafaelGSS
Copy link
Member Author

Honestly, I really don't agree with the points mentioned, but at this point I don't have energy and willingness to jump in a debate that could take forever. So I'll just close it 😄

@RafaelGSS RafaelGSS closed this Nov 19, 2023
@ovflowd
Copy link
Member

ovflowd commented Nov 19, 2023

Honestly, I really don't agree with the points mentioned, but at this point I don't have energy and willingness to jump in a debate that could take forever

I'm sorry this is the reason why you're closing this. I doubt anyone here is trying to criticize your work... At least, I believe we're trying to redirect the energy and focus on something that might make more sense for the time being. After all I believe we all can reach on agreements if we compromise a bit-to-bit and find the right spot where we believe something nice can come out of this 👀

But these are just my takes on it. Take it with a grain or two of salt :)

Please keep working on this initiative, I see the value of a benchmarking module!

@jasnell
Copy link
Member

jasnell commented Nov 19, 2023

@RafaelGSS ... I'd prefer not to close this so quickly. While the opinions of a couple of contributors are important to consider they do not represent the opinions of all contributors and a few of us just simply have not yet had the time to even review and consider this adequately. I wouldn't give up so quickly.

I, for one, am completely on board with adding a benchmarking module to core. For me the issue is just in the technical details of the implementation not in debating the exact process in which something should be brought in or not.

@ovflowd
Copy link
Member

ovflowd commented Nov 19, 2023

As @jasnell said himself 💖

@joyeecheung
Copy link
Member

Sorry that it ended this way. I think there are two different aspects to a proposal like this:

  1. Should Node.js provide a built-in benchmark tool that helps user write reliable and useful benchmarks
  2. Should this built-in benchmark tool be built from scratch in Node.js core instead of being released and stablized as an npm package first

So far I don't think anyone in this thread are against 1. The objections are mostly about 2. So this at least provides some value to posterity - 1 is likely what people want and less controversial. But 2 is controversial and would meet pushbacks in core. If anyone wants to pick this initiative up, which can still be very valuable to the community, an npm-package first path is more likely to lead to success.

@ovflowd
Copy link
Member

ovflowd commented Nov 19, 2023

Right, precisely as @joyeecheung said, I think we're (or at least that's my interpretation) all onboard on board with having a benchmarking module, but we might first need to assess what's the best shape for this and do some more research and testing, maybe even through something as a NPM package.

* @param {number} percentile
* @returns {number}
*/
percentile(percentile) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the built-in histogram API... why create another vs. modifying / building off the existing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using it initially, but the current Histogram API didn't support float numbers, so we lost some precision over the results.

Also, we wanna remove the outliers, so we thought was better to create a new Histogram class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer modifying the current in a way that adds the needed functionality.

Copy link
Member

@H4ad H4ad Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current one is implemented using C++, and we didn't have access to the underlying samples/values (as a public method), ref: https://nodejs.org/api/perf_hooks.html#class-histogram

Maybe can we do a follow-up exposing more methods on the Histogram class, but for now, I think it will not be worth it unless we choose to not have removeOutliers and cv, then using Histogram class makes total sense.

process.stdout.write(bench.name);
process.stdout.write(' x ');
process.stdout.write(`${formatter.format(opsSecReported)} ops/sec +/- `);
process.stdout.write(formatter.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple writes like this are going to be fairly expensive on their own. Can this be coalesced? Or perhaps can this use cork()/uncork()` to improving buffering and reduce the number of individual flushing writes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can build the whole string and then just .write, the calls were divided in this way just for readability purposes.

@ronag
Copy link
Member

ronag commented Nov 19, 2023

Just want to point out. It's great that we want to standardize, have processes etc. but if we make things too difficult then people will simply not bother to contribute. I don't think this needs to be in core either, but if that's what it takes to motivate someone as skilled as @RafaelGSS to do it, then let's do it. It seems developing it for core makes a lot more people invested in making it something great. I'd rather have something good in core than "nothing" outside of core. If we can achieve it through an npm package that's even better but should not be a blocker.

@GeoffreyBooth
Copy link
Member

I also think we're perhaps a bit too quick to assume that building it as an npm package is preferable. If we start building it and there are advantages to it being in core, like having access to internals, then that's a solid reason why it should be a core API. We can always publish it as a package in addition if it can function that way, like we did with the test runner, but I'm fine with any approach.

@ovflowd
Copy link
Member

ovflowd commented Nov 19, 2023

Just want to point out. It's great that we want to standardize, have processes etc. but if we make things too difficult then people will simply not bother to contribute.

I think that's a little bit unfair. I don't think our intention is to unmotivate Rafael. Now, Rafael of course, has the right to take the feedback the way he wants and do whatever with it. It's a collaboration process, and all parties/sides from this collaboration process should have the right to have a voice. (Not insinuating that you're saying that our feedback has no value, but I felt compelled into voicing it 🙈)

I doubt any of the feedback here (including mine) is because we want to make things complex, most of the feedback is pretty much, let's try to test this out and get more data. Getting this merged today or in a few months will probably not make much of a difference. To be fair, if the majority here feels and agrees that we can get this into the core at the moment, sure, why not? But is there that much harm in maybe conducting a little bit more long-term research, making some tests, and gathering feedback from the community?

I don't want to be quick on hindering someone's work, and I hope that's not the tone I'm giving here 😅; maybe I'm just the sort of person who prefers to take more time into crafting things. By no means is my commentary here an objection to this PR (otherwise, I would have pressed the button), nor my opinions should be any means of rejecting Rafael's work. As I said before, and I'll repeat, I appreciate his work.

@ronag
Copy link
Member

ronag commented Nov 19, 2023

I think that's a little bit unfair. I don't think our intention is to unmotivate Rafael.

I never said it was anyone's intention. Just pointing out that intentionally or not, making the process too difficult can turn people off from contributing. Whether that is the case here or not, I don't know.

@ovflowd
Copy link
Member

ovflowd commented Nov 19, 2023

I think that's a little bit unfair. I don't think our intention is to unmotivate Rafael.

I never said it was anyone's intention. Just pointing out that intentionally or not, making the process too difficult can turn people off from contributing. Whether that is the case here or not, I don't know.

Agreed, there's a very delicate balance to have here. I just hope my words are not un motivating Rafael 🫠

@joyeecheung
Copy link
Member

joyeecheung commented Nov 19, 2023

It seems developing it for core makes a lot more people invested in making it something great. I'd rather have something good in core than "nothing" outside of core.

Benchmarking is difficult and needs proper design. If designed incorrectly it can do more harm than good, as we should've learned from SunSpider and CrankshaftScript. The negative effect of unidiomatic benchmarking not only puts burden on us but also puts burden on V8, as we've also learned from the CrankshaftScript era. I'd rather have nothing in core than having a benchmarking tool that encourages a comeback of SunSpider/CrankshaftScript in the ecosystem. Even with these many Node.js contributors I still doubt our ability in evaluating what makes a benchmarking tool bad (from the top of our heads at least) and I am sure there can be tons of mistakes that we will encounter on the path. Doing a WIP outside of core allows us to move faster and correct mistakes quickly - in Node.js core, even releasing a fast-tracked fix to a large-scale regression (like the one we had in v20.6.0) could need more a day, any other form of changes would only take longer, and I expect substantial changes to take months to land due to the way we do reviews here. Whereas with an npm package you can just do reviews differently and release changes more easily and quickly. The slowness of pace would only get worse if this gets backported to LTS, especially maintenance mode LTS which is actually what most users are using (if not the EOL versions, and they account for >50% downloads today). Node.js core simply isn't a great place to do active development and if we can avoid it (because we don't need access to internals), we should avoid it.

@H4ad
Copy link
Member

H4ad commented Nov 19, 2023

Just to address some comments about the API, we created a table using 7 benchmarks differently to try to understand what features they have:

Feature Deno.bench Mitata IsItFast (0.0.8) Benchmark.js Cronometro Tinybench Built-in
Measurement Method MEAN MEAN MEAN MEAN MEAN MEAN MEAN
Max Ops/Sec ~300M/s ~300M/s N/A 1,2B/s 14M/s 23M/s ~300M/s
Measure Iter/time Y N Y Y Y Y Y
Measure time/iter Y Y Y N N Y Y
Measurement standard deviation N N Y Y Y Y Y (CV)
P95 Y Y N N N Y Y
Group Y Y N N Y N WIP
Built-in Reporter Y Y Y N Y Y Y
Compiled Benchmark N N WIP Y N N WIP
Syntax Declarative Declarative Imperative Imperative Imperative Imperative Imperative
Support Permissions (node/deno) Y N N N N N ?
Cares about V8? N N Y N N N WIP
Warmup N N Y ? Y Y N
Filter in Y N N N N N N
Filter out Y N N N N N N
Only/Ignore Y N N N N N N
Runs all (in folder Y N N N N N N
Runs all with filter (in folder) Y N N N N N N
Console Output Y Y Y Y Y Y Y
Console Output (colored) Y Y Y N Y Y N
JSON Output Y Y N N N N N
Markdown Output N N N N N N N
Machine Info Output Y Y N N N N N
Isolated (runs on worker) N N WIP N Y N WIP
Memory Report N N Y N N N WIP
Runs GC after benchmark N N Y N N N N
Events N N Y Y Y Y ?
Tear-up/Teardown Y WIP Y Y Y Y Y
Execution Speed 3s 4s 1s 32s 4s 3s 3s

In general, the APIs of the benchmarks are Imperative, which means it will look like the benchmark.js, where you have your Suite or Benchmark, and only one being declarative, where it looks like node: test.

For the current design, we chose the API to be similar to benchmark.js, which probably has more than 10 years that was built and still is the best benchmark library for both Browser and Node, and being similar to benchmark.js, makes it easier for to people migrate because people wrote a lot of benchmarks using that library.

We also tried to make it as stable as possible and closer to the results that we saw on benchmark.js, you can see some of those results on benchmark comparison repository that I created to join multiple runs of each tool in multiple scenarios.

About tear-up and teardown, we learned from Deno the pattern of .start and .end and we combined it with the pattern that we saw on google/benchmark that provides the amount of iterations the user should run. With this combination, we can provide the user control over how they set up without introducing overhead to the measurement.

My idea in the future is also to do something that the benchmark.js does which is compiling the benchmark, basically what it does is: copy the benchmark function to a new code, check if runs without error, and then create a new code with the function body inside of it, in this way, we eliminate the overhead of calling a new function and we make the code more realistic to real code.


The negative effect of unidiomatic benchmarking not only puts the burden on us but also puts the burden on V8, as we've also learned from the CrankshaftScript era

I don't think there is any API that we can build that will prevent people from writing bad benchmarks, you can write bad benchmarks with any tool that we have today.

Introducing a benchmark module could probably encourage people to evaluate their code more, and whatever the outcome of that, the only thing that will happen is that people will be more interested in how V8/Node works and how they can optimize the things.

If we want to help people avoid common micro-benchmark mistakes, we can add documentation about them and about V8 Flags, I think giving a powerful tool and teaching people how to use it is better than giving no tool at all.


About creating a package instead of shipping directly into the core, we currently keep the implementation as simple as possible to not include too many features to be reviewed at the same time, but in the future, we want to include:

  • Memory Report
  • V8 Flags
    • The idea behind this is to simplify those flags and give more visibility to the user to control how they can benchmark their code.
    • We can also give more hints about what is happening in the code with trace-opt and many other flags.

I agree that we can build this out of the core, but the idea of it being inside the core is to have more visibility and more contributions from everyone interested in benchmarking their code.

My opinion about this topic is: that it will be good if we can land on core, but I don't mind having to build an NPM package first, the only downside of this it will take much more time for this feature to land (eventually).

@joyeecheung
Copy link
Member

joyeecheung commented Nov 20, 2023

I don't think there is any API that we can build that will prevent people from writing bad benchmarks, you can write bad benchmarks with any tool that we have today.

I actually think there are many interface options that would encourage people to write better benchmarks, and the current API is quite far from them. For example, we can look at the two most typical use case of Node.js - servers and command line tools.

  1. For servers, typically people measure throughput (req/sec) by running a server application in one process, and run another process that generates load on this server, and check the number of requests it can process in a given duration. For example, the fastify benchmark documentation outlines the kind of utility it needs for benchmarking a server.
  2. For command line tools, typically people create a pre-defined workload on disk and then passing it to the tool launched in a process and measure the time taken to finish the workload. A very popular utility for measuring command line tool performance is hyperfine.

I think a good benchmarking tool should aid users in developing benchmarks for this kind of more real-world use cases, and the prioritization of these cases over microbenchmarking of very simple snippets like the one mentioned in the documentation here already sends a good enough message to the users. At least I don't see how the current API address these two very common benchmarking use cases, and the focus in the documentation already sends a signal that we value microbenchmarks on simple snippets over macrobenchmarks on real world apps.

I would say a very simple criteria of how useful this built-in can be is to see if we can replace our own benchmark runner/comparer with the built-in. As imperfect as our own benchmark runner/comparer is, it can actually be used for benchmarking the two cases mentioned above (though in the first case I wished that it has better ability to generate/combine workload and in the second case I still prefer using hyperfine). If a new benchmark runner is even less good than our imperfect benchmark runner/comparer, I don't find it very convincing to say that it's good enough to be shipped in core instead of being iterated a bit more as an npm package.

@joyeecheung
Copy link
Member

joyeecheung commented Nov 20, 2023

the only thing that will happen is that people will be more interested in how V8/Node works and how they can optimize the things.

I'd say helping users benchmark their own software and check improvements/regressions is a good goal, but helping them learn about Node.js/V8 internals and even worse, make assumptions about them when writing their own code is not a good goal for a benchmark module, or at least that alone is not enough as a motivation to add a benchmark module to Node.js core. We should've learned our mistakes from the CrankshaftScript era that internals of a dependency can be very brittle and encouraging users to make assumptions about the internals can make internal changes difficult. It is fine to create a separate project for a learning experience, but codifying this as an API can backfire on us and V8.

The idea behind this is to simplify those flags and give more visibility to the user to control how they can benchmark their code.
We can also give more hints about what is happening in the code with trace-opt and many other flags.

This would result in maintenance burden on V8, so I think we need consent from the V8 team for this. The V8 flags are not considered as public APIs and they are always subject to change/removal. V8 is totally free to either remove --expose-gc, --trace-opt, --trace-deopt, --allow-natives-syntax (and any runtime functions that it exposes), break their behavior, or make them no-ops. If we want to control/get more information from V8, that should be done as embedder APIs first, but from our experience this kind of APIs are not on the table for V8 (e.g. even having an API to expose all available flags was not considered acceptable).

@H4ad
Copy link
Member

H4ad commented Nov 20, 2023

two most typical use case of Node.js - servers and command line tools.

I don't think the complexity of designing a benchmark API that can handle server, command lines, and also microbenchmarks is worth it.

Or you create a benchmark for the server, or you create a benchmark for the command line, or you create a benchmark for operations, I don't think you can do all at once, and I don't think that exists a benchmark tool that does something like that today in any language.

So, the two typical cases that you mentioned are not the goal of this benchmark module.

Maybe, maybe, we can at least do this first iteration that includes benchmark only for operations (microbenchmarks, or whatever we call it), and then in the next iterations we can start adding other tools to address benchmarking of those cases, for server we can include autocannon, for command line we can build with spawn.


I would say a very simple criteria of how useful this built-in can be is to see if we can replace our own benchmark runner/comparer with the built-in

This is not our goal here, in the same way, you can't run the current runner/comparer to benchmark between two operations, I don't think we should build a benchmark tool to address this scenario where we test two node versions.

But, not as easily as we have today, we can benchmark different Node versions using benchmark.js on nodejs-bench-operations, but takes more effort to do this kind of comparison.


In general, if you want to be able to benchmark anything, we should address this in phases:

  • microbenchmarks
  • server
  • command lines/initialization
  • comparer between node versions

@H4ad
Copy link
Member

H4ad commented Nov 20, 2023

It is fine to create a separate project for a learning experience, but codifying this as an API can backfire on us and V8.

If we want to control/get more information from V8, that should be done as embedder APIs first, but from our experience this kind of APIs are not on the table for V8

Well, luckily we didn't try to implement this kind of API in the beginning, we didn't know this kind of issue.

Thinking about maintainability, maybe V8 Flags are not a good thing to introduce, I thought they were more stable and didn't remove these flags after they were introduced. If they can remove whatever they want, that puts this feature in a position where, as you said, either we do something noop or break (or it makes us introduce some patch), either option will be bad :/

@isaacs
Copy link
Contributor

isaacs commented Nov 20, 2023

+1 to everything @joyeecheung is saying. I agree 100%.

There's a lot of talk about what is or isn't demotivating to @RafaelGSS. No disrespect to anyone's skill or value, but no one's motivation or lack thereof should be the primary concern here. This is about what's the best benchmark tool to have in node core, and the process to discover that. People come and go, feelings and motivations rise and fall, but an api in node core will likely still be in node core in 10 years.

There is disagreement about what kind of benchmark tool should be in core, and whether an imperative micro benchmarking tool is even a good idea to have in core. Without process isolation and realistic workloads, it may do more harm than good. Developing in a user land package allows for evaluation and iteration of approaches without committing to supporting something long term.

It doesn't have to rise up to become the most popular option before being pulled in. It's just much more effective than iterating on pull requests forever, which get effectively zero play testing. But it sounds like some talking needs to happen about what sort "benchmarking" is even desirable, since there is an argument being made that a micro benchmarking api would be harmful to put in core.

@joyeecheung
Copy link
Member

joyeecheung commented Nov 20, 2023

microbenchmarks

I think this should be a non-goal and even better, something that the built-in benchmark tool specifically advise against. We should learn from the mistakes of SunSpider and CrankshaftScript that microbenchmarks should be discouraged, not encouraged. Fostering a culture of microbenchmarks over real-world use-case benchmarks is not the right kind of legacy that Node.js should leave.

@RafaelGSS
Copy link
Member Author

Hello folks, I wanted to clear up any misunderstandings that may have arisen from my previous message. I apologize if it seemed like I was becoming unmotivated or angry about the objections raised. That was not the case at all. In fact, I greatly appreciate the concern shown and the effort put into providing feedback.

As many of you already know, my life has become very busy recently and unfortunately, I don't have much bandwidth to commit to different areas of nodejs except security. So, since this PR requires a significant commitment to elaborate on why we believe the concerns raised by @joyeecheung and @isaacs are missing the main objective of this PR I decided to just close it and reopen it when I have more time to engage in discussions in a meaningful way. Otherwise, my comments would be superficial and not particularly helpful, and this would only make it more difficult for other members to review and understand the context of this long thread.

We didn't stop the work, but we decided that at this point the better is to keep going with our research outside of core and in a more opportune moment, we will come back and discuss this subject again. I'm sorry that my message sounded a bit flaming.

@webpro
Copy link

webpro commented Dec 21, 2023

Only here to say I learned from/via the discussion here (thanks!) and wrote up a bit about macro and micro benchmarking: https://www.webpro.nl/articles/the-state-of-benchmarking-in-nodejs. I think not much has been written about this topic, so wanted to raise awareness a bit. Feel free to reach out to me for feedback/mistakes (not in this thread)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.