[WIP] process: add process.[start|stop]TracingAgent() #18826

jasnell · 2018-02-16T21:50:53Z

Work in progress. Do not land. Not ready for CI

Ping @nodejs/diagnostics ...

One of the identified limitations of the v8 trace event support is that it is not currently possible to enable it at runtime. This PR is an attempt to start working towards enabling that capability.

So far it's pretty simple:

process.startTracingAgent('list,of,categories'); // returns true if agent was started as a result of this call, false otherwise

process.stopTracingAgent(); // returns true if agent was stopped, false otherwise

If the --trace-events--enabled command-line flag is used, then both of these are non-ops that will return false.

There are several key questions:

Is it safe to allow starting and stopping of the tracing agent dynamically like this? Note that starting will cause the existing node_trace.1.log to be overwritten each time, just like starting a new trace. We should be able to allow better handling of that once trace_events: add file pattern cli option #18480 lands.
Is this the way that we should allow the tracing to be enabled/disabled dynamically at runtime? Is there an alternative that would work better?
There's currently no indication at the JS layer that tracing is enabled or disabled. Does there need to be and what form should that take (simple process.traceEnabled flag, perhaps?)
Is this barking up the wrong tree entirely? If so, what alternate route should we take?

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

process, trace_events

mmarchini · 2018-02-16T22:31:35Z

Once #18480 land we should have the filename as an optional argument to process.startTracingAgent('list,of,categories');. Maybe it should also keep track of the last used file and write a new one when starting the TracingAgent again?

Suggestion: instead of process.startTracingAgent/process.stopTracingAgent we could have process.tracing_agent.start()/process.tracing_agent.stop()/process.tracing_agent.isActive().

I'm not that familiar with TracingAgent. Is it possible to have more than one running at the same time?

AndreasMadsen · 2018-02-16T22:34:28Z

Didn't review, but I would prefer this to go in a trace_event module, as we will likely have more trace_events related methods. You should also familiarize yourself with the work that has already been done:

Original require('tracing') EPS: https://github.com/jasongin/node-eps/blob/81cf5a6c28da9f5cbdf6f92c47094bcfacd9c2d0/00X-tracing.md
WIP code on the require('tracing') module: https://github.com/jasongin/nodejs/blob/tracejs/lib/tracing.js
We also have a process.binding('trace_events') C++ binding where I think the C++ code should go. See also lib/internal/trace_events_async_hooks.js for its current use.

I think most of this will answer your questions.

jasnell · 2018-02-16T22:52:40Z

re: require('trace_event') as a new top-level module... I'm hesitant to introduce a new top level module given that push back that other recent additions have had and the fact that we must treat new top-levels as semver-major (making it impossible to backport the new mechanism). Introducing the new set of APIs as new methods on an existing module will allow us to treat this as a semver-minor.

And Ah! I knew there was a previous effort at this but I lost track of where it was. Completely forgot to go look in the node-eps repo :-)

mcollina · 2018-02-16T23:07:39Z

I'm 👍 in adding a new top level module for this, as it seems the most likely direction we will go for tracing.

Another option is to attach them to require('vm').traceEvents. Mainly because it's a mechanism provided by V8.

jasnell · 2018-02-16T23:18:09Z

Refamiliarizing myself now with the details of @jasongin's excellent eps write up... and, to be honest, it has a lot more mechanism and complexity than what I think we minimally would need here.

A vm intrinsic for handling emit from the JS side is likely going to be the better route (if it proves possible)
The internals for the Tracing object are rather more complex than what we minimally would need. There shouldn't be any reason to work with trace event categories as an array. A simple string should do, and getting those back as a simple string when tracing is enabled.
I really like @mcollina's suggestion of dropping this off require('vm').traceEvents. That's really a fantastic fit for it.

let n = 1;
const { traceEvents } = require('vm');
traceEvents.startAgent('node.perf', `node_trace.${n}.log`);
console.log(traceEvents.enabled);
console.log(traceEvents.categories);
traceEvents.stopAgent();

This should be enough of a minimal API to cover the basics. If the vm intrinsic path for actually emitting the events does not pan out we can look at the more complex mechanism later.

AndreasMadsen · 2018-02-17T18:56:03Z

and, to be honest, it has a lot more mechanism and complexity than what I think we minimally would need here.

I completely agree. I think the important lessons from that is not in the API but in the capabilities and how that is accomplished.

In this case, I think it would be much better to use the SetCategories method rather than restarting the TracingAgent.

I also think it would be better to start and stop the TracingAgent when there are no enabled categories rather than making it explicit.

Setting the file destination dynamically also has its unfortunate implications as that won't capture the initial trace events. Thus I think this feature enables an antipattern.

I'm hesitant to introduce a new top level module given that push back that other recent additions have had and the fact that we must treat new top-levels as semver-major (making it impossible to backport the new mechanism).

The pushback is in my opinion not about a new top-level module but about a large feature additions. A push back I think is also valid in this case, as we are also in the process of creating a Unified User Defined Tracing Point implementation. That I think should be more thought through before large trace_events additions are implemented.

The concerns about "making it impossible to backport the new mechanism" doesn't really seem valid, as the TSC can decide to make an exception.

jasnell · 2018-02-19T00:01:11Z

In this case, I think it would be much better to use the SetCategories method rather than restarting the TracingAgent....I also think it would be better to start and stop the TracingAgent when there are no enabled categories rather than making it explicit.

Interesting, ok, I think I can get on board with that. Just to make sure I understand correctly, if the tracing agent is not yet started, using setCategories() with a non-empty value would start the tracing agent, while using setCategories(undefined) (or null or an empty string) would stop the tracing agent.

What do you think of the limitation added in this PR that if the command line flag is set, the tracing agent cannot be switched on or off via the API?

Setting the file destination dynamically also has its unfortunate implications as that won't capture the initial trace events. Thus I think this feature enables an antipattern.

I agree to an extent, and yes, there will be events that do not end up being captured. I believe that's a limitation that those using this mechanism would need to just accept given that they have the alternative of launching the process with tracing enabled.

The concerns about "making it impossible to backport the new mechanism" doesn't really seem valid, as the TSC can decide to make an exception.

Given that adding a new module is a semver-major, I'm one TSC member who would definitely vote against landing it in an LTS branch simply on the principle that landing semver-major's in an LTS branch simply isn't done unless it's for security purposes. (I do see that @mcollina has at least grabbed the namespace for us tho, just in case)

bnoordhuis · 2018-02-19T13:35:52Z

If we're bikeshedding API: there should be a way to additively enable or disable categories in order for disparate parts of the application to tweak the tracer without stepping on each other's foot.

jkrems · 2018-02-19T15:45:32Z

If we're bikeshedding API: there should be a way to additively enable or disable categories in order for disparate parts of the application to tweak the tracer without stepping on each other's foot.

Interesting challenge but I'm not sure it will be possible to prevent accidental trace events making it into the collector unless we want to actively filter on the sink side and support multiple sinks. But being able to "non-destructively" add a category seems like a very worthwhile goal. Minimal would be to have getCategories and allow userland implementations of such a mechanism. Fancier would be addCategories()/removeCategories(), fanciest would be subscriptionToken = addCategories()/removeCategories(subscriptionToken) to cleanly allow overlapping subscriptions.

AndreasMadsen · 2018-02-19T16:58:09Z

If we're bikeshedding API: there should be a way to additively enable or disable categories in order for disparate parts of the application to tweak the tracer without stepping on each other's foot.

Yes. This was essentially what I wanted. SetCategories is just the internal trace_events method that should be used. And not StartAgent and StopAgent. The agent can simply be stopped when there are no categories left.

jasnell · 2018-02-20T15:36:53Z

without stepping on each other's foot

Just so I'm clear... still writing out to a single target yes?

The flow I have in mind is this:

If --trace-events-enabled or --trace-event-categories is used, the trace agent is launched and cannot be turned off by the process. The categories may still be extended but cannot shrink below the initial set specified by --trace-event-categories.
If tracing is not turned on using the command line flags, calling trace_events.setCategories('...') will start the agent if it is not yet started (assuming the categories is not undefined|null|empty-string), then use SetCategories().
If SetCategories() is undefined|null|empty-string, and tracing is not turned on using the command line flags, stop the tracing agent. This will not free the tracing agent however, so it may be restarted to keep appending to the same trace file.
Optionally we can add a trace_events.stopAgent() that would stop and actually free the agent if the command-line flags are not used.

AndreasMadsen · 2018-02-20T16:01:35Z

My thoughts:

--trace-events-enabled is removed. Categories must be enabled explicitly and when enabled the TracingAgent starts. It is not possible to enable all categories (read: when we eventually allow custom categories) anyway:

https://github.com/nodejs/node/blob/1e57a8d1171558774c849e7a8e5d8b91b003ffab/src/tracing/agent.cc#L37L39

--trace-event-categories adds the categories on startup, allowing to catch trace_events before JS is invoked.
trace_events.addCategory(category) adds a category dynamically
trace_events.removeCategory(category) removes a category dynamically
trace_events.hasCategory(category) checks if category is added

Motivation:

The forced behavior of --trace-event-categories seems unnecessarily complicated.
Allowing individual add and remove of a category, makes interoperability at least a bit feasible.
Not exposing StopAgent and StartAgent directly, also makes interoperability at least a bit feasible.

Final thoughts:

I think it would be better to think of user-defined tracing points (UDTP) as a whole before pushing a very trace_event specific API.

Making this a semver-major is also fine in my mind. The relevancy of this API doesn't really show until we have custom user-defined tracing points (UDTP). Or at least a unified tracing point strategy and that will likely be a semver-major anyway.

mike-kaufman · 2018-02-20T19:47:34Z

A few comments on proposed API:

It's a little odd that a command-line switch would change behavior of programmatic API. Why is this?
I would expect symmetry between Start_Agent(...) & Stop_Agent(...). Proposal (in some iterations above) has Start_Agent(...) accepting a list of categories and Stop_Agent accepting no args.
If tracing is not turned on using the command line flags, calling trace_events.setCategories('...') will start the agent

I think a more explicit API has clear start/stop semantics (i.e., a global toggle for turning on/off tracing irrespective of any enabled categories), and enable/disable category semantics (i.e., a way to set the filters on which categories are enabled, no impact on global toggle).

BridgeAR · 2018-03-06T20:10:16Z

Ping @jasnell

jasnell · 2018-03-06T23:36:24Z

@BridgeAR ... still a WIP. Please leave this open.

jasnell · 2018-03-08T16:38:04Z

Replaced by #19233

process: add process.[start|stop]TracingAgent()

747779e

jasnell added wip Issues and PRs that are still a work in progress. discuss Issues opened for discussions and feedbacks. process Issues and PRs related to the process subsystem. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. labels Feb 16, 2018

nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Feb 16, 2018

jasnell mentioned this pull request Mar 8, 2018

trace_events: enable trace events implicitly, add module #19233

Closed

4 tasks

jasnell closed this Mar 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] process: add process.[start|stop]TracingAgent() #18826

[WIP] process: add process.[start|stop]TracingAgent() #18826

jasnell commented Feb 16, 2018 •

edited

Loading

mmarchini commented Feb 16, 2018

AndreasMadsen commented Feb 16, 2018 •

edited

Loading

jasnell commented Feb 16, 2018

mcollina commented Feb 16, 2018

jasnell commented Feb 16, 2018 •

edited

Loading

AndreasMadsen commented Feb 17, 2018 •

edited

Loading

jasnell commented Feb 19, 2018

bnoordhuis commented Feb 19, 2018

jkrems commented Feb 19, 2018 •

edited

Loading

AndreasMadsen commented Feb 19, 2018

jasnell commented Feb 20, 2018

AndreasMadsen commented Feb 20, 2018 •

edited

Loading

mike-kaufman commented Feb 20, 2018

BridgeAR commented Mar 6, 2018

jasnell commented Mar 6, 2018

jasnell commented Mar 8, 2018

[WIP] process: add process.[start|stop]TracingAgent() #18826

[WIP] process: add process.[start|stop]TracingAgent() #18826

Conversation

jasnell commented Feb 16, 2018 • edited Loading

Checklist

Affected core subsystem(s)

mmarchini commented Feb 16, 2018

AndreasMadsen commented Feb 16, 2018 • edited Loading

jasnell commented Feb 16, 2018

mcollina commented Feb 16, 2018

jasnell commented Feb 16, 2018 • edited Loading

AndreasMadsen commented Feb 17, 2018 • edited Loading

jasnell commented Feb 19, 2018

bnoordhuis commented Feb 19, 2018

jkrems commented Feb 19, 2018 • edited Loading

AndreasMadsen commented Feb 19, 2018

jasnell commented Feb 20, 2018

AndreasMadsen commented Feb 20, 2018 • edited Loading

mike-kaufman commented Feb 20, 2018

BridgeAR commented Mar 6, 2018

jasnell commented Mar 6, 2018

jasnell commented Mar 8, 2018

jasnell commented Feb 16, 2018 •

edited

Loading

AndreasMadsen commented Feb 16, 2018 •

edited

Loading

jasnell commented Feb 16, 2018 •

edited

Loading

AndreasMadsen commented Feb 17, 2018 •

edited

Loading

jkrems commented Feb 19, 2018 •

edited

Loading

AndreasMadsen commented Feb 20, 2018 •

edited

Loading