Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n-api: Potential N-API performance optimizations #14379

Closed
jasongin opened this issue Jul 19, 2017 · 11 comments
Closed

n-api: Potential N-API performance optimizations #14379

jasongin opened this issue Jul 19, 2017 · 11 comments
Labels
node-api Issues and PRs related to the Node-API. performance Issues and PRs related to the performance of Node.js.

Comments

@jasongin
Copy link
Member

I analyzed the performance of a native module that was converted to N-API using a profiling tool, and compared the results to the original module that used V8 APIs. While overall the overhead of N-API is fairly minimal already, I did manage to identify 3 potential optimizations. Each of these can substantially reduce the impact of N-API in certain scenarios. I have tested an early version of these fixes already to confirm that, but I wanted to give a chance for discussion before I submit a PR.

1. Creating integer values

N-API currently offers only one way to create a number value:

napi_status napi_create_number(napi_env env, double value, napi_value* result);

V8 has an optimized representation for 32-bit integer values, but because the value provided to N-API is always a double it always calls v8::Number::New() (never v8::Integer::New), so it does not create the optimal integer representation. Therefore these integer values are slower to create and slower to work with than they could be.

Instead of a single napi_create_number() API, there should probably be one for each of: int32_t, uint32_t, int64_t, double. Note there are already napi_get_value_*() functions for each of those 4 types, so having the same 4 napi_create_*() variants is more natural anyway.

2. Getting integer values

The N-API functions that get integer values do some work to get a v8::Context that is never actually used. The profiler data showed that the call to v8::Isolate::GetCurrentContext() is actually somewhat expensive. (And it is apparently not optimized out by the compiler.)

The implementation of napi_get_value_int32() includes this code:

  RETURN_STATUS_IF_FALSE(env, val->IsNumber(), napi_number_expected);
  v8::Local<v8::Context> context = isolate->GetCurrentContext();
  *result = val->Int32Value(context).FromJust();

But v8::Value::Int32Value() does not use the context argument when the value is a number type (a condition that was already checked above):

Maybe<int32_t> Value::Int32Value(Local<Context> context) const {
  auto obj = Utils::OpenHandle(this);
  if (obj->IsNumber()) return Just(NumberToInt32(*obj));
  PREPARE_FOR_EXECUTION_PRIMITIVE(context, Object, Int32Value, int32_t);
  i::Handle<i::Object> num;
  has_pending_exception = !i::Object::ToInt32(isolate, obj).ToHandle(&num);
  RETURN_ON_FAILED_EXECUTION_PRIMITIVE(int32_t);
  return Just(num->IsSmi() ? i::Smi::cast(*num)->value()
                           : static_cast<int32_t>(num->Number()));
}

I can think of two ways to make this faster:

  • Call the v8::Value::Int32Value() overload that does not take a context (and does not return a maybe). The problem is it is marked as "to be deprecated soon".
  • Pass an empty v8::Local<v8::Context> value to v8::Value::Int32Value(). This relies on the internal implementation detail that it does not use the context when the value is a number type. But in practice it should be safe, and will be easily caught by tests in the unlikely event V8 ever changes that API behavior.

I also considred caching the v8::Context in the napi_env structure, but that probably isn't valid because APIs can be called from different context scopes.

3. Allocating handle scopes

V8 handle scopes are normally stack-allocated. But the current N-API implementation puts them on the heap, which means every entry/exit of a scope involves expensive new and delete operations.

I can think of two ways to make this faster:

  • Change the design of all the N-API handle scope APIs so that the caller must pass in a pointer to a (presumaly stack-allocated) handle scope structure to be initialized. The problem is that the size of that structure is VM-specific (and must be part of the ABI). While V8 is currently the only JS engine that uses handle scopes with an N-API implementation, defining a V8-specific structure would seem like a leak in the abstraction.
  • Pre-allocate memory for some small fixed number of handle scopes (maybe only 1?), attached to the napi_env. Track which ones are used/freed, and allocate new handle scopes on the heap only if the pre-allocated ones are all in use.
@jasongin jasongin added node-api Issues and PRs related to the Node-API. performance Issues and PRs related to the performance of Node.js. labels Jul 19, 2017
@TimothyGu
Copy link
Member

TimothyGu commented Jul 20, 2017

  1. SGTM.

  2. Indeed, Isolate::GetCurrentContext() is not defined inline in the header file, so it's not possible for the compiler to optimize it out w/o LTO. I guess passing an empty context is the best way to go here.

  3. Pre-allocate memory for some small fixed number of handle scopes (maybe only 1?), attached to the napi_env.

    This seems to be the way to go. A handle scope isn't that big memory-wise anyway so a couple more won't hurt either. Though, given v8::HandleScope only has an explicit constructor with an overloaded operator new(size_t size) that calls ABORT(), how do you plan to implement this?

@jasongin
Copy link
Member Author

jasongin commented Jul 20, 2017

given v8::HandleScope only has an explicit constructor with an overloaded operator new(size_t size) that calls ABORT(), how do you plan to implement this?

N-API already uses a wrapper around v8::HandleScope to enable it to be allocated on the heap. So I think we can just call operator new(size_t, void*) on that wrapper class, and that will construct the HandleScope member.

@addaleax
Copy link
Member

V8 has an optimized representation for 32-bit integer values, but because the value provided to N-API is always a double it always calls v8::Number::New() (never v8::Integer::New), so it does not create the optimal integer representation. Therefore these integer values are slower to create and slower to work with than they could be.

Reading the V8 source code, that doesn’t seem to be true:

node/deps/v8/src/factory.cc

Lines 1332 to 1337 in 7fdcb68

// Materialize as a SMI if possible
int32_t int_value;
if (DoubleToSmiInteger(value, &int_value)) {
return handle(Smi::FromInt(int_value), isolate());
}

Regarding 2: It’s probably easiest to just add

if (val->IsInt32()) {
  *result = val.As<Int32>()->Value();
  return napi_clear_last_error(env);
}

even before the val->IsNumber() check. I’ll open a PR shortly.

addaleax added a commit to addaleax/node that referenced this issue Jul 20, 2017
@jasongin
Copy link
Member Author

Yes, you're right v8::Number::New() does eventually create the optimized integer representation. But it's slower than v8::Integer::New().

Somehow I had overlooked v8::Int32::Value() -- thanks!

@jasongin
Copy link
Member Author

jasongin commented Jul 21, 2017

Two more...

4. Over-use of v8::TryCatch

All of the napi_create_*() functions use the NAPI_PREAMBLE macro that sets up a v8::TryCatch at the function scope. However these functions have no possibility of executing JavaScript code so as far as I know there is no reason to pay the cost of the TryCatch. The same is true for napi_wrap() and napi_get_buffer_info() functions. Errors in non-JavaScript V8 API calls are reported either via the fatal error callback (node::OnFatalError()) or as empty Maybe return values, never via JavaScript exceptions catchable with a TryCatch.

5. Inefficient storage of pointers in internal fields

The N-API function callback adapters wrap pointers in v8::External values for use with SetInternalField() / GetInternalField(). But it's more efficient to omit the v8::External wrapper and use SetAlignedPointerInInternalField() / GetAlignedPointerFromInternalField() instead. (I think we can assume these pointers will be aligned... anyway there is a check that reports a fatal error if they are not.)

addaleax added a commit that referenced this issue Jul 24, 2017
Ref: #14379
PR-URL: #14393
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
addaleax added a commit that referenced this issue Jul 24, 2017
Ref: #14379
PR-URL: #14393
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
jasongin added a commit to jasongin/nodejs that referenced this issue Aug 1, 2017
 - Add separate APIs for creating different kinds of numbers,
   because creating a V8 number value from an integer is faster
   than creating one from a double.
 - When getting number values, avoid getting the current context
   because the context will not actually be used and is expensive
   to obtain.
 - When creating values, don't use v8::TryCatch (NAPI_PREAMBLE),
   because these functions have no possibility of executing JS code.

Refs: nodejs#14379
jasongin added a commit to jasongin/nodejs that referenced this issue Aug 8, 2017
 - Add separate APIs for creating different kinds of numbers,
   because creating a V8 number value from an integer is faster
   than creating one from a double.
 - When getting number values, avoid getting the current context
   because the context will not actually be used and is expensive
   to obtain.
 - When creating values, don't use v8::TryCatch (NAPI_PREAMBLE),
   because these functions have no possibility of executing JS code.

Refs: nodejs#14379
PR-URL: nodejs#14573
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
addaleax pushed a commit that referenced this issue Aug 10, 2017
 - Add separate APIs for creating different kinds of numbers,
   because creating a V8 number value from an integer is faster
   than creating one from a double.
 - When getting number values, avoid getting the current context
   because the context will not actually be used and is expensive
   to obtain.
 - When creating values, don't use v8::TryCatch (NAPI_PREAMBLE),
   because these functions have no possibility of executing JS code.

Refs: #14379
PR-URL: #14573
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this issue Apr 10, 2018
Ref: nodejs#14379
PR-URL: nodejs#14393
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this issue Apr 10, 2018
 - Add separate APIs for creating different kinds of numbers,
   because creating a V8 number value from an integer is faster
   than creating one from a double.
 - When getting number values, avoid getting the current context
   because the context will not actually be used and is expensive
   to obtain.
 - When creating values, don't use v8::TryCatch (NAPI_PREAMBLE),
   because these functions have no possibility of executing JS code.

Refs: nodejs#14379
PR-URL: nodejs#14573
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
MylesBorins pushed a commit that referenced this issue Apr 16, 2018
Ref: #14379
Backport-PR-URL: #19447
PR-URL: #14393
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: James M Snell <[email protected]>
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
Reviewed-By: Jason Ginchereau <[email protected]>
MylesBorins pushed a commit that referenced this issue Apr 16, 2018
 - Add separate APIs for creating different kinds of numbers,
   because creating a V8 number value from an integer is faster
   than creating one from a double.
 - When getting number values, avoid getting the current context
   because the context will not actually be used and is expensive
   to obtain.
 - When creating values, don't use v8::TryCatch (NAPI_PREAMBLE),
   because these functions have no possibility of executing JS code.

Refs: #14379
Backport-PR-URL: #19447
PR-URL: #14573
Reviewed-By: Timothy Gu <[email protected]>
Reviewed-By: Anna Henningsen <[email protected]>
Reviewed-By: Colin Ihrig <[email protected]>
@gabrielschulhof
Copy link
Contributor

I believe we have addressed all points except 3).

@gabrielschulhof
Copy link
Contributor

There's not much more we can do, and we've covered most of these points. Closing.

@jorangreef
Copy link
Contributor

Was 4. Over-use of v8::TryCatch addressed for napi_get_buffer_info?

@gabrielschulhof
Copy link
Contributor

@jorangreef looks like NAPI_PREAMBLE() (which uses the TryCatch) has now been reduced to CHECK_ENV() which does not 👍

@mvduin
Copy link

mvduin commented Jul 6, 2019

@TimothyGu

Indeed, Isolate::GetCurrentContext() is not defined inline in the header file, so it's not possible for the compiler to optimize it out w/o LTO.

For GCC and Clang, applying __attribute__((pure)) (or [[gnu::pure]]) to the declaration would allow the compiler to optimize away calls to this function if its return value is never used.

@bnoordhuis
Copy link
Member

Isolate::GetCurrentContext() isn't pure, it has side effects - it creates a new Local<Context> in the active HandleScope.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
node-api Issues and PRs related to the Node-API. performance Issues and PRs related to the performance of Node.js.
Projects
None yet
Development

No branches or pull requests

7 participants