Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM intrinsic trace event... #153

Closed
jasnell opened this issue Feb 13, 2018 · 26 comments
Closed

VM intrinsic trace event... #153

jasnell opened this issue Feb 13, 2018 · 26 comments

Comments

@jasnell
Copy link
Member

jasnell commented Feb 13, 2018

To parallel the trace event support at the native layer, it would be helpful to have an intrinsic trace event facility accessible on the JS layer that we can use to efficiently instrument core js code. This does not need to be as expressive as the current set of macros (the ability to mark instants and span start/end should be sufficient).

The motivation for a vm intrinsic is the ability for the vm to optimize these as much as possible. We could implement these via a process binding but doing so certainly adds a runtime cost that an intrinsic should be able to avoid.

In most cases in core, the existing async_hooks/trace_events coverage actually covers the most interesting cases. It does not cover things like EventEmitter, however. Emitting an event, for instance, should cause a trace event to be emitted if trace events are enabled and should be a non-op with no runtime cost if not (e.g. have the compiler optimize the trace call out if v8 is not tracing).

ee.prototype.emit(event, ...args) {
  %trace_event_begin('node', 'eventemitter.emit', event, ...)
  // synchronously emit events
  %trace_event_end('node', 'eventemitter.emit', event, ...)
}
@ofrobots
Copy link
Contributor

/cc @bmeurer and @hashseed. We have chatted about this recently.

@hashseed
Copy link
Member

Easiest approach I could think of is to dump trace events into a buffer first and every time the buffer is full, switch to C++ to commit these events. Or do the committing off-thread.

Looking at the implementation of trace events I think inlining into jit code is very difficult and also hard to maintain.

@jasnell
Copy link
Member Author

jasnell commented Feb 13, 2018

We could buffer in each js stack and as we exit, after we handle the nextTicks we can commit the buffer... that should limit the amount of buffering and would ensure only one boundary cross per stack.

@hashseed
Copy link
Member

Problem with buffering is that you lose information if you crash before committing.

@jasnell
Copy link
Member Author

jasnell commented Feb 13, 2018

yep... which is why I'd prefer to avoid it or at least minimize what we're buffering.

@ofrobots
Copy link
Contributor

We don't necessarily have to stick with the trace events macros' implementation of the trace buffer writing. If there is a different way this could be implemented by the v8 supported intrinsic, we could adjust how the macros work.

@ofrobots
Copy link
Contributor

To capture what was discussed in person.. what is essential is that the compare and branch is fast. The actual writing to the buffer doesn't need to be inlined necessarily.

@hashseed
Copy link
Member

Currently the idea I have in mind is:

  • V8 exposes a JavaScript function that logs trace events
  • This function should perform tracing as fast as possible (inlining, lightweight C++ call, etc)
  • Node.js can call this function for tracing.
  • Override this function by empty function if tracing is disabled. V8 to make sure empty functions always get inlined.

@hashseed
Copy link
Member

@bmeurer wdyt?

@jasnell
Copy link
Member Author

jasnell commented Feb 14, 2018

Definitely sounds reasonable on this end. The one question that remains is how Node.js would access that function but I assume that's a minor detail so long as the vms are still able to inline them.

@mike-kaufman ... any thoughts on whether that would work for chakra?

@ofrobots
Copy link
Contributor

ofrobots commented Feb 14, 2018

I can see a few scenarios:

  • Scenario that all trace-events are disabled. VM should be able to completely eliminate the trace-event overhead.
  • I would expect that production workloads would have trace-events always enabled for a limited subset of categories. This means that we need the compare-and-branch that guards the (infrequently executed) add-to-trace-buffer operation needs to be as fast as possible.

Simply replacing the function by an empty function wouldn't serve the production use-case well. I guess the broader point I will make is that the concept that diagnostics == debugging is not true. For server side applications diagnostics == production.

@mike-kaufman
Copy link
Contributor

/cc @digitalinfinity, @MSLaguana - any thoughts on how this would work on node-chakra-core?

@mike-kaufman
Copy link
Contributor

mike-kaufman commented Feb 15, 2018

To capture what was discussed in person.. what is essential is that the compare and branch is fast.

What is the thinking about arguments passed to trace function here? If intent is this is fast and simple implementation on back-end, API would be something like this:

const category = ...
%if_trace_enabled(category) {
  %trace(category, someExpensiveFuncWithSideEffects());
}

Without %if_trace_enabled construct, you'd have to eval args to the trace functions, and this could be expensive (or you'd have to do some special-casing magic in the jit/interpreter that introduces consequence of code not executing as expected.

@jasnell
Copy link
Member Author

jasnell commented Mar 15, 2018

So just out of curiosity I decided to try instrumenting the synchronous code in EventEmitter to output trace events using the same internal binding used by js AsyncResources. Emitting a begin and end marking the entry and exit of emit, along with begin and end events marking each handler yields some extremely interesting information about sync code flow but results in an average 85% perf loss in the events benchmarks without actually emitting any events -- due largely to just the category enabled checks. I'm going to play around with it a bit more but the perf results are an interesting data point here

@AndreasMadsen
Copy link
Member

@jasnell did you try cacheing the check in JavaScript. For example by synchronizing a Set object. I would imagine that just the C++ call for the category check takes a lot.

@jasnell
Copy link
Member Author

jasnell commented Mar 15, 2018

Nope, ran out of time before the plane landed ;) that's the next step on the flight back out to CA.

@jasnell
Copy link
Member Author

jasnell commented May 4, 2018

Quick update: I've started work on the implementation for this in V8 and will hopefully have a PR to V8 within the next week... at the very least so that we have something we can try out.

@hashseed
Copy link
Member

hashseed commented May 4, 2018

I discussed this with @mcollina and a few other folks. These JS functions should probably be exposed through V8 extras, defined either here or here, and used like this.

@jasnell
Copy link
Member Author

jasnell commented May 4, 2018

Excellent. I was just narrowing in on the same thing and you preemptively answered the questions I was going to ask later today :-D

@hashseed
Copy link
Member

hashseed commented May 4, 2018

Note that there is a difference between the extras_bindings object and extras_utils object. The extras_bindings object is the one you want to install. The extras_utils object is going to be removed from the context and left to die after bootstrapping. It is used to expose stuff to extras scripts, which does not apply here. The bindings object is used at runtime and exposed via v8::Context::GetExtrasBindingObject.

@jasnell
Copy link
Member Author

jasnell commented May 4, 2018

I did want to clarify one bit: in a different conversation I believe it was implied that v8 extras may not be something we'd want to rely on long term. Is my understanding correct? I just want to make sure that going this route aligns with v8 long term.

Also: ping @nodejs/node-chakracore ... Going this route would require some work on the chakra side to support it on that end so want to make sure it's on the radar.

@hashseed
Copy link
Member

hashseed commented May 4, 2018

After discussion we want to use v8 extras for this use case since anything new that we come up with to export JS functions would be just the same thing.

@jasnell
Copy link
Member Author

jasnell commented May 4, 2018

Awesome ok. Quick clarification on your point about extras_binding vs. extras_utils... Should I be calling Install function(extras_bindings, ...) Within Genesis::InstallExtraNatives() ?

@hashseed
Copy link
Member

hashseed commented May 4, 2018

Yes, that sounds like a good approach to me.

@gireeshpunathil
Copy link
Member

should this remain open? [ I am trying to chase dormant issues to closure ]

@jasnell
Copy link
Member Author

jasnell commented Oct 25, 2019

No this can be closed as the feature landed in v8 some time ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants