-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: add experimental benchmark module #50768
Conversation
Review requested:
|
How do we know that this tool is better than the ones already out there in the wild? From a glance of the code this seems perfectly fine to be released first as an npm package. We can at least do that first and experiment in the wild for a while, make sure that it's mature enough before bringing it into core. This also allows independent versioning and makes it available on older Node.js release lines without too many complications. We have done this in the past with node-report and maybe a few other modules that I am forgetting.
I don' think this is a strong enough argument to not release this as an npm package first. Or at least it feels weird to assert this without even trying it to release this as an npm package first. Developing a module from scratch in core to replace alternatives already popular in the wild because we claim to do it better (without even verifying with user feedback) seems to be a dangerous path to go down with, unless we have very good arguments e.g. it's already a standard and has compliance test suites, or it has to rely on certain things in core & avoid monkey-patching. Without such arguments, I think it's overall not a good message for the ecosystem - it means the work you put into an npm package can become obsolete just because Node.js releases an official alternative and claims to do it better without user verification first, even though the alternative does not need access to any Node.js internals/implement any widely accepted standards and looks perfectly okay to be another npm pacakage and play a fair game. |
Co-authored-by: Vinícius Lourenço <[email protected]>
9356d0d
to
523ff0e
Compare
@joyeecheung I will elaborate on that soon as I get some time! |
Could we not pick another name rather than restricting it to the |
validateNumber(timeInNs, 'timeInNs', 0); | ||
|
||
if (timeInNs > 1e9) { | ||
return `${NumberPrototypeToFixed(timeInNs / 1e9, 2)}s`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need primordials in benchmark runner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the use of primordials hasn't reached any consensus yet, until then, we preferred to use it just because all the rest of code on node uses it.
And the usage of primordial in these cases didn't impact the benchmark performance, so is safe to use them.
@RafaelGSS It might be possible to add CPU performance counters (when they are available). E.g., when running in privileged mode. That would require some C++, evidently. |
I've just updated the PR description as an attempt to make it more clear
So, that's why this PR is still a draft. We're evaluating the assertiveness of this new module. However, I'd not say "better" than the ones out there. Some might be wrong, but some just use a different approach to calculate the metrics. The main objective of this module is to standardise the nodejs benchmark results out there.
Right, so I'll try to explain that. I genuinely don't understand how a new package would solve the issue described in the PR description. How is it different as adding Using I strongly believe the community wants this module and I definitely don't want to sound like we're claiming to do things better, but I believe this isn't a good point of objection here. The goal is always to improve the ecosystem and having a built-in sounds like a good path to go IMO. Tagging @nodejs/tsc just for visibility (no need to add to the agenda). |
If we added node:test I don't understand why we wouldn't add this? How is this different? |
@ronag i don't think it's universally agreed that adding node:test was a good idea either. For testing, there simply is no majority agreed upon way to author or run tests, and as such, there was/is no clear pattern to follow when building node's module, and as such, node:test can feel like a mismatch to many. For benchmarking, is there a majority agreed upon way to do them? In terms of output format, JS API, techniques, etc? If there is, then I'd expect a userland module to follow them, and then a core module would make sense when it can do something better than the userland module (which includes, being built in). If there isn't, then it seems premature for node to try to figure out what the best patterns are. |
I don't think it was a good idea to add node:test. But I also think it's not a good idea to be inconsistent. |
Repeating one bad decision isn't the same thing as consistency :-) |
I would say a built-in benchmark module for utility is at least worth consideration, but a built-in benchmark module for standardizing anything for the Node.js ecosystem doesn't seem right. Node.js isn't a standardization venue. If we want to standardize anything, the design should be done elsewhere and be stabilized elsewhere before Node.js starts incoorporating it in the releases.
I think we should rather ask which one of these issues cannot be solved if it's a npm package. As far as I can tell the only difference is that being in core gives it a bit of free marketing but even then I doubt how much difference that makes - we have many later-added built-in features that users still aren't aware of. An npm package can still get way better marketing than a Node.js builtin when there's enough effort put into it. On the other hand developing a totally user-land package in core from scratch has many downsides: more burden on the CI, more noises in the issue tracker, coupled versioning and backports needed for LTS, etc. I would say for any new built-in that can be implemented entirely in the user land, we should just do it as an npm package first, and only bring it in when it stabilizes to reduce all those startup overhead.
And we had a lot of reliability flakes, CI burden and issue tracker noises from
The community wants good benchmarking tools, I am sure, and I am not against adding built-in benchmarking tools. I am against developing a benchmarking tool from scratch in core instead of experimenting it as an independently versioned npm package with its own issue tracker & CI first. Good benchmarking tools need proper designs, and landing a premature design can actually be harmful - for example the current API of this module already encourages microbenchmarking and gives the smell of the SunSpider problem (one of the doc example even looks like something straight out of SunSpider). It took browsers years to realize that SunSpider was a mistake and misled the JavaScript performance work. We should not repeat the same mistake on the server side. The API also rings a bell for the CrankshaftScript we had that regressed in the Turbofan migration and I think in 2023 we should've learned something from these mistakes and work on something that prevents microbenchmarking instead. Adding a premature benchmarking tool that encourages microbenchmarking can actually do more harm than good, we need to keep that in mind. The current API isn't really useful for checking different implementation of your own software either (unless it's a very simple library that can be copy-pasted and you are only writing throw-away benchmarks so you don't care much about copy-pasting), rather it's mostly useful to check different ways of using your dependency, which I think is actually less common/helpful than the former (i.e. what our |
We've been consistently doing the "npm package first, bring them in when they are stable" practice for at least fetch, node-inspect, and node-report. |
If you are wrong, then it should be published as an npm module first, and pulled into core once validated and standardized. If you are right, then it shouldn't exist at all. If it's not better (according to actual user choice) than the examples on npm, and there's no consistency among the examples on npm, then that means that pulling this into core will hinder the exploration of the solution space, and prevent an optimal solution (or multiple optimal solutions for various different situations) from being discovered. Or, it will go unused and become a wart on the node api. This does not belong in core in my opinion. |
I share @ljharb and @isaacs opinions here. We just do not have data to make such claims. When thinking on introducing new APIs I always think that following TC39's route is a very interesting path of testing adoption of new experimental modules.
This is the sort of API change that needs time to mature. If adoption in the course of 6months-1year doesn't happen even after shared efforts of promoting it, then it indeed has no place in core. The rushed attitude we have at times of adding things on core without properly analysing and testing and doing research over long periods of time is extremely dangerous. On an unrelated note, it is what right now shakes me regarding the navigator.* api changes that we're introducing on Node... But that is a different topic. Some of the node:test changes got released so quickly that it feels we didn't give enough time for more teams and the community to properly provide feedback (this is the sort of thing for what the Next 10 team exists for)... I love the "native test runner" but I wonder if the way it was released was appropriate. This is the same reasoning onto why, this shouldn't land, at least now. Let's take a deep breath, some steps back and course correct here. We can definitely benefit from a benchmarking api, but we really need to design that well, validate it, and ensure that this is something that users would really benefit on from belonging on core vs being a downstream library. Because nothing stops us from just making an officially supported library, like undici. Which sadly still struggles with adoption... |
BTW, also wanted to make it clear that I appreciate the work you've put together here, @RafaelGSS :) |
Honestly, I really don't agree with the points mentioned, but at this point I don't have energy and willingness to jump in a debate that could take forever. So I'll just close it 😄 |
I'm sorry this is the reason why you're closing this. I doubt anyone here is trying to criticize your work... At least, I believe we're trying to redirect the energy and focus on something that might make more sense for the time being. After all I believe we all can reach on agreements if we compromise a bit-to-bit and find the right spot where we believe something nice can come out of this 👀 But these are just my takes on it. Take it with a grain or two of salt :) Please keep working on this initiative, I see the value of a benchmarking module! |
@RafaelGSS ... I'd prefer not to close this so quickly. While the opinions of a couple of contributors are important to consider they do not represent the opinions of all contributors and a few of us just simply have not yet had the time to even review and consider this adequately. I wouldn't give up so quickly. I, for one, am completely on board with adding a benchmarking module to core. For me the issue is just in the technical details of the implementation not in debating the exact process in which something should be brought in or not. |
As @jasnell said himself 💖 |
Sorry that it ended this way. I think there are two different aspects to a proposal like this:
So far I don't think anyone in this thread are against 1. The objections are mostly about 2. So this at least provides some value to posterity - 1 is likely what people want and less controversial. But 2 is controversial and would meet pushbacks in core. If anyone wants to pick this initiative up, which can still be very valuable to the community, an npm-package first path is more likely to lead to success. |
Right, precisely as @joyeecheung said, I think we're (or at least that's my interpretation) all onboard on board with having a benchmarking module, but we might first need to assess what's the best shape for this and do some more research and testing, maybe even through something as a NPM package. |
* @param {number} percentile | ||
* @returns {number} | ||
*/ | ||
percentile(percentile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the built-in histogram API... why create another vs. modifying / building off the existing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using it initially, but the current Histogram API didn't support float numbers, so we lost some precision over the results.
Also, we wanna remove the outliers, so we thought was better to create a new Histogram class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer modifying the current in a way that adds the needed functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current one is implemented using C++, and we didn't have access to the underlying samples/values (as a public method), ref: https://nodejs.org/api/perf_hooks.html#class-histogram
Maybe can we do a follow-up exposing more methods on the Histogram class, but for now, I think it will not be worth it unless we choose to not have removeOutliers
and cv
, then using Histogram class makes total sense.
process.stdout.write(bench.name); | ||
process.stdout.write(' x '); | ||
process.stdout.write(`${formatter.format(opsSecReported)} ops/sec +/- `); | ||
process.stdout.write(formatter.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple writes like this are going to be fairly expensive on their own. Can this be coalesced? Or perhaps can this use cork()/
uncork()` to improving buffering and reduce the number of individual flushing writes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can build the whole string and then just .write
, the calls were divided in this way just for readability purposes.
Just want to point out. It's great that we want to standardize, have processes etc. but if we make things too difficult then people will simply not bother to contribute. I don't think this needs to be in core either, but if that's what it takes to motivate someone as skilled as @RafaelGSS to do it, then let's do it. It seems developing it for core makes a lot more people invested in making it something great. I'd rather have something good in core than "nothing" outside of core. If we can achieve it through an npm package that's even better but should not be a blocker. |
I also think we're perhaps a bit too quick to assume that building it as an npm package is preferable. If we start building it and there are advantages to it being in core, like having access to internals, then that's a solid reason why it should be a core API. We can always publish it as a package in addition if it can function that way, like we did with the test runner, but I'm fine with any approach. |
I think that's a little bit unfair. I don't think our intention is to unmotivate Rafael. Now, Rafael of course, has the right to take the feedback the way he wants and do whatever with it. It's a collaboration process, and all parties/sides from this collaboration process should have the right to have a voice. (Not insinuating that you're saying that our feedback has no value, but I felt compelled into voicing it 🙈) I doubt any of the feedback here (including mine) is because we want to make things complex, most of the feedback is pretty much, let's try to test this out and get more data. Getting this merged today or in a few months will probably not make much of a difference. To be fair, if the majority here feels and agrees that we can get this into the core at the moment, sure, why not? But is there that much harm in maybe conducting a little bit more long-term research, making some tests, and gathering feedback from the community? I don't want to be quick on hindering someone's work, and I hope that's not the tone I'm giving here 😅; maybe I'm just the sort of person who prefers to take more time into crafting things. By no means is my commentary here an objection to this PR (otherwise, I would have pressed the button), nor my opinions should be any means of rejecting Rafael's work. As I said before, and I'll repeat, I appreciate his work. |
I never said it was anyone's intention. Just pointing out that intentionally or not, making the process too difficult can turn people off from contributing. Whether that is the case here or not, I don't know. |
Agreed, there's a very delicate balance to have here. I just hope my words are not un motivating Rafael 🫠 |
Benchmarking is difficult and needs proper design. If designed incorrectly it can do more harm than good, as we should've learned from SunSpider and CrankshaftScript. The negative effect of unidiomatic benchmarking not only puts burden on us but also puts burden on V8, as we've also learned from the CrankshaftScript era. I'd rather have nothing in core than having a benchmarking tool that encourages a comeback of SunSpider/CrankshaftScript in the ecosystem. Even with these many Node.js contributors I still doubt our ability in evaluating what makes a benchmarking tool bad (from the top of our heads at least) and I am sure there can be tons of mistakes that we will encounter on the path. Doing a WIP outside of core allows us to move faster and correct mistakes quickly - in Node.js core, even releasing a fast-tracked fix to a large-scale regression (like the one we had in v20.6.0) could need more a day, any other form of changes would only take longer, and I expect substantial changes to take months to land due to the way we do reviews here. Whereas with an npm package you can just do reviews differently and release changes more easily and quickly. The slowness of pace would only get worse if this gets backported to LTS, especially maintenance mode LTS which is actually what most users are using (if not the EOL versions, and they account for >50% downloads today). Node.js core simply isn't a great place to do active development and if we can avoid it (because we don't need access to internals), we should avoid it. |
Just to address some comments about the API, we created a table using 7 benchmarks differently to try to understand what features they have:
In general, the APIs of the benchmarks are For the current design, we chose the API to be similar to We also tried to make it as stable as possible and closer to the results that we saw on About tear-up and teardown, we learned from Deno the pattern of My idea in the future is also to do something that the
I don't think there is any API that we can build that will prevent people from writing bad benchmarks, you can write bad benchmarks with any tool that we have today. Introducing a benchmark module could probably encourage people to evaluate their code more, and whatever the outcome of that, the only thing that will happen is that people will be more interested in how V8/Node works and how they can optimize the things. If we want to help people avoid common micro-benchmark mistakes, we can add documentation about them and about V8 Flags, I think giving a powerful tool and teaching people how to use it is better than giving no tool at all. About creating a package instead of shipping directly into the core, we currently keep the implementation as simple as possible to not include too many features to be reviewed at the same time, but in the future, we want to include:
I agree that we can build this out of the core, but the idea of it being inside the core is to have more visibility and more contributions from everyone interested in benchmarking their code. My opinion about this topic is: that it will be good if we can land on core, but I don't mind having to build an NPM package first, the only downside of this it will take much more time for this feature to land (eventually). |
I actually think there are many interface options that would encourage people to write better benchmarks, and the current API is quite far from them. For example, we can look at the two most typical use case of Node.js - servers and command line tools.
I think a good benchmarking tool should aid users in developing benchmarks for this kind of more real-world use cases, and the prioritization of these cases over microbenchmarking of very simple snippets like the one mentioned in the documentation here already sends a good enough message to the users. At least I don't see how the current API address these two very common benchmarking use cases, and the focus in the documentation already sends a signal that we value microbenchmarks on simple snippets over macrobenchmarks on real world apps. I would say a very simple criteria of how useful this built-in can be is to see if we can replace our own benchmark runner/comparer with the built-in. As imperfect as our own benchmark runner/comparer is, it can actually be used for benchmarking the two cases mentioned above (though in the first case I wished that it has better ability to generate/combine workload and in the second case I still prefer using |
I'd say helping users benchmark their own software and check improvements/regressions is a good goal, but helping them learn about Node.js/V8 internals and even worse, make assumptions about them when writing their own code is not a good goal for a benchmark module, or at least that alone is not enough as a motivation to add a benchmark module to Node.js core. We should've learned our mistakes from the CrankshaftScript era that internals of a dependency can be very brittle and encouraging users to make assumptions about the internals can make internal changes difficult. It is fine to create a separate project for a learning experience, but codifying this as an API can backfire on us and V8.
This would result in maintenance burden on V8, so I think we need consent from the V8 team for this. The V8 flags are not considered as public APIs and they are always subject to change/removal. V8 is totally free to either remove |
I don't think the complexity of designing a benchmark API that can handle Or you create a benchmark for the server, or you create a benchmark for the command line, or you create a benchmark for operations, I don't think you can do all at once, and I don't think that exists a benchmark tool that does something like that today in any language. So, the two typical cases that you mentioned are not the goal of this benchmark module. Maybe, maybe, we can at least do this first iteration that includes benchmark only for operations (microbenchmarks, or whatever we call it), and then in the next iterations we can start adding other tools to address benchmarking of those cases, for server we can include
This is not our goal here, in the same way, you can't run the current But, not as easily as we have today, we can benchmark different Node versions using In general, if you want to be able to benchmark anything, we should address this in phases:
|
Well, luckily we didn't try to implement this kind of API in the beginning, we didn't know this kind of issue. Thinking about maintainability, maybe V8 Flags are not a good thing to introduce, I thought they were more stable and didn't remove these flags after they were introduced. If they can remove whatever they want, that puts this feature in a position where, as you said, either we do something noop or break (or it makes us introduce some patch), either option will be bad :/ |
+1 to everything @joyeecheung is saying. I agree 100%. There's a lot of talk about what is or isn't demotivating to @RafaelGSS. No disrespect to anyone's skill or value, but no one's motivation or lack thereof should be the primary concern here. This is about what's the best benchmark tool to have in node core, and the process to discover that. People come and go, feelings and motivations rise and fall, but an api in node core will likely still be in node core in 10 years. There is disagreement about what kind of benchmark tool should be in core, and whether an imperative micro benchmarking tool is even a good idea to have in core. Without process isolation and realistic workloads, it may do more harm than good. Developing in a user land package allows for evaluation and iteration of approaches without committing to supporting something long term. It doesn't have to rise up to become the most popular option before being pulled in. It's just much more effective than iterating on pull requests forever, which get effectively zero play testing. But it sounds like some talking needs to happen about what sort "benchmarking" is even desirable, since there is an argument being made that a micro benchmarking api would be harmful to put in core. |
I think this should be a non-goal and even better, something that the built-in benchmark tool specifically advise against. We should learn from the mistakes of SunSpider and CrankshaftScript that microbenchmarks should be discouraged, not encouraged. Fostering a culture of microbenchmarks over real-world use-case benchmarks is not the right kind of legacy that Node.js should leave. |
Hello folks, I wanted to clear up any misunderstandings that may have arisen from my previous message. I apologize if it seemed like I was becoming unmotivated or angry about the objections raised. That was not the case at all. In fact, I greatly appreciate the concern shown and the effort put into providing feedback. As many of you already know, my life has become very busy recently and unfortunately, I don't have much bandwidth to commit to different areas of nodejs except security. So, since this PR requires a significant commitment to elaborate on why we believe the concerns raised by @joyeecheung and @isaacs are missing the main objective of this PR I decided to just close it and reopen it when I have more time to engage in discussions in a meaningful way. Otherwise, my comments would be superficial and not particularly helpful, and this would only make it more difficult for other members to review and understand the context of this long thread. We didn't stop the work, but we decided that at this point the better is to keep going with our research outside of core and in a more opportune moment, we will come back and discuss this subject again. I'm sorry that my message sounded a bit flaming. |
Only here to say I learned from/via the discussion here (thanks!) and wrote up a bit about macro and micro benchmarking: https://www.webpro.nl/articles/the-state-of-benchmarking-in-nodejs. I think not much has been written about this topic, so wanted to raise awareness a bit. Feel free to reach out to me for feedback/mistakes (not in this thread)! |
This PR introduces a new built-in module to Node.js:
node:benchmark
. Following the reasoning ofnode:test
this module should only be available with thenode:
prefix. I know we’ve discussed adding a built-inbenchmark
module in Node.js a couple of times, so I read all the issues I could find and will try to summarise why we believe we should have it on Node.js.For context, a benchmark module was attempted to be included on Node.js core on #44974, however, that pull request was basically exposing our internal benchmark tool to the userland code. While I agree with all the objections made in that PR - mainly the ones saying our internal benchmark tool wasn’t supposed to be used to compare different functions, but different node.js binaries - I believe Node.js must have a built-in module where people can rely on the results and make real assumptions in their software.
Problem
The fundamental problem is that there isn't a reliable benchmark tool for Node.js users to rely on currently. Before creating this PR, I conducted research with Vinicius, comparing different benchmark packages while conducting the same experiment. The results observed showed some inconsistencies and, because they use different approaches to measure the benchmark, the results are not comparable to each other. This PR is still in draft while Vinicius and I finish this little research.
Node.js and its dependencies update quickly, and it can be challenging to keep up with recent and sometimes undocumented changes - mainly in the package scope. While performance has always been important to the project, it has recently become a hot topic. Having a tool or module that users can use to monitor regressions in their applications would fill a gap that I believe Node.js currently lacks.
The benchmark module
I developed the benchmark module along with @H4ad and our objective is to make it quite simple and extensible, for this experimental moment - you will find the roadmap we’ve planned for this at the end of this section.
As you can see in the documentation, the usage is pretty simple and easily portable from
benchmark.js
:This PR is still a draft since we’re working on the following TODO and checking different approaches to guarantee the reliability of our results. We’ve decided to open it earlier to discuss strategies and see if someone is strongly against adding this built-in module.
TODO:
worker_threads
to guarantee a clean environment on each benchmarkROADMAP:
These are the features we believe would make sense to include in a second or third iteration of this PR:
--trace-opt
--trace-deopt
--expose-internals
+%OptimizeOnNextCall()
FAQ
I’ve presented an initial draft to some Node.js collaborators at NodeConfEU and some of them raised a few concerns, including:
undici
?Nothing prevents this code from being in a package instead of Node.js built-in. However, this goes against the main reason we’ve created this PR: We’d like to standardise the way people benchmark nodejs functions, mainly, to prevent misinformation or cherry-picked data. I really believe, that if we make it a module, we won’t achieve that, people won’t use it.
node --bench my-benchmark.js
instead of a built-in module (require(’node:benchmark’
)?One of the objectives of this module is to be easily integrated into any custom regression checker tool. Users can simply require it to quickly set up a benchmark whenever needed by turning it into a module.
common.createBenchmark()
?That’s not the main objective of this PR. As previously said, our internal benchmark tool works pretty well (in my opinion) in our environment. However, it doesn’t prevent us from extending the benchmark module in a feasible way so we can internally use it as well.