Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] process: add process.[start|stop]TracingAgent() #18826

Closed
wants to merge 1 commit into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Feb 16, 2018

Work in progress. Do not land. Not ready for CI

Ping @nodejs/diagnostics ...

One of the identified limitations of the v8 trace event support is that it is not currently possible to enable it at runtime. This PR is an attempt to start working towards enabling that capability.

So far it's pretty simple:

process.startTracingAgent('list,of,categories'); // returns true if agent was started as a result of this call, false otherwise

process.stopTracingAgent(); // returns true if agent was stopped, false otherwise

If the --trace-events--enabled command-line flag is used, then both of these are non-ops that will return false.

There are several key questions:

  1. Is it safe to allow starting and stopping of the tracing agent dynamically like this? Note that starting will cause the existing node_trace.1.log to be overwritten each time, just like starting a new trace. We should be able to allow better handling of that once trace_events: add file pattern cli option #18480 lands.

  2. Is this the way that we should allow the tracing to be enabled/disabled dynamically at runtime? Is there an alternative that would work better?

  3. There's currently no indication at the JS layer that tracing is enabled or disabled. Does there need to be and what form should that take (simple process.traceEnabled flag, perhaps?)

  4. Is this barking up the wrong tree entirely? If so, what alternate route should we take?

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

process, trace_events

@jasnell jasnell added wip Issues and PRs that are still a work in progress. discuss Issues opened for discussions and feedbacks. process Issues and PRs related to the process subsystem. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. labels Feb 16, 2018
@nodejs-github-bot nodejs-github-bot added the c++ Issues and PRs that require attention from people who are familiar with C++. label Feb 16, 2018
@mmarchini
Copy link
Contributor

Once #18480 land we should have the filename as an optional argument to process.startTracingAgent('list,of,categories');. Maybe it should also keep track of the last used file and write a new one when starting the TracingAgent again?

Suggestion: instead of process.startTracingAgent/process.stopTracingAgent we could have process.tracing_agent.start()/process.tracing_agent.stop()/process.tracing_agent.isActive().

I'm not that familiar with TracingAgent. Is it possible to have more than one running at the same time?

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Feb 16, 2018

Didn't review, but I would prefer this to go in a trace_event module, as we will likely have more trace_events related methods. You should also familiarize yourself with the work that has already been done:

I think most of this will answer your questions.

@jasnell
Copy link
Member Author

jasnell commented Feb 16, 2018

re: require('trace_event') as a new top-level module... I'm hesitant to introduce a new top level module given that push back that other recent additions have had and the fact that we must treat new top-levels as semver-major (making it impossible to backport the new mechanism). Introducing the new set of APIs as new methods on an existing module will allow us to treat this as a semver-minor.

And Ah! I knew there was a previous effort at this but I lost track of where it was. Completely forgot to go look in the node-eps repo :-)

@mcollina
Copy link
Member

I'm 👍 in adding a new top level module for this, as it seems the most likely direction we will go for tracing.

Another option is to attach them to require('vm').traceEvents. Mainly because it's a mechanism provided by V8.

@jasnell
Copy link
Member Author

jasnell commented Feb 16, 2018

Refamiliarizing myself now with the details of @jasongin's excellent eps write up... and, to be honest, it has a lot more mechanism and complexity than what I think we minimally would need here.

  1. A vm intrinsic for handling emit from the JS side is likely going to be the better route (if it proves possible)

  2. The internals for the Tracing object are rather more complex than what we minimally would need. There shouldn't be any reason to work with trace event categories as an array. A simple string should do, and getting those back as a simple string when tracing is enabled.

  3. I really like @mcollina's suggestion of dropping this off require('vm').traceEvents. That's really a fantastic fit for it.

let n = 1;
const { traceEvents } = require('vm');
traceEvents.startAgent('node.perf', `node_trace.${n}.log`);
console.log(traceEvents.enabled);
console.log(traceEvents.categories);
traceEvents.stopAgent();

This should be enough of a minimal API to cover the basics. If the vm intrinsic path for actually emitting the events does not pan out we can look at the more complex mechanism later.

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Feb 17, 2018

and, to be honest, it has a lot more mechanism and complexity than what I think we minimally would need here.

I completely agree. I think the important lessons from that is not in the API but in the capabilities and how that is accomplished.

In this case, I think it would be much better to use the SetCategories method rather than restarting the TracingAgent.

I also think it would be better to start and stop the TracingAgent when there are no enabled categories rather than making it explicit.

Setting the file destination dynamically also has its unfortunate implications as that won't capture the initial trace events. Thus I think this feature enables an antipattern.

I'm hesitant to introduce a new top level module given that push back that other recent additions have had and the fact that we must treat new top-levels as semver-major (making it impossible to backport the new mechanism).

The pushback is in my opinion not about a new top-level module but about a large feature additions. A push back I think is also valid in this case, as we are also in the process of creating a Unified User Defined Tracing Point implementation. That I think should be more thought through before large trace_events additions are implemented.

The concerns about "making it impossible to backport the new mechanism" doesn't really seem valid, as the TSC can decide to make an exception.

@jasnell
Copy link
Member Author

jasnell commented Feb 19, 2018

In this case, I think it would be much better to use the SetCategories method rather than restarting the TracingAgent....I also think it would be better to start and stop the TracingAgent when there are no enabled categories rather than making it explicit.

Interesting, ok, I think I can get on board with that. Just to make sure I understand correctly, if the tracing agent is not yet started, using setCategories() with a non-empty value would start the tracing agent, while using setCategories(undefined) (or null or an empty string) would stop the tracing agent.

What do you think of the limitation added in this PR that if the command line flag is set, the tracing agent cannot be switched on or off via the API?

Setting the file destination dynamically also has its unfortunate implications as that won't capture the initial trace events. Thus I think this feature enables an antipattern.

I agree to an extent, and yes, there will be events that do not end up being captured. I believe that's a limitation that those using this mechanism would need to just accept given that they have the alternative of launching the process with tracing enabled.

The concerns about "making it impossible to backport the new mechanism" doesn't really seem valid, as the TSC can decide to make an exception.

Given that adding a new module is a semver-major, I'm one TSC member who would definitely vote against landing it in an LTS branch simply on the principle that landing semver-major's in an LTS branch simply isn't done unless it's for security purposes. (I do see that @mcollina has at least grabbed the namespace for us tho, just in case)

@bnoordhuis
Copy link
Member

If we're bikeshedding API: there should be a way to additively enable or disable categories in order for disparate parts of the application to tweak the tracer without stepping on each other's foot.

@jkrems
Copy link
Contributor

jkrems commented Feb 19, 2018

If we're bikeshedding API: there should be a way to additively enable or disable categories in order for disparate parts of the application to tweak the tracer without stepping on each other's foot.

Interesting challenge but I'm not sure it will be possible to prevent accidental trace events making it into the collector unless we want to actively filter on the sink side and support multiple sinks. But being able to "non-destructively" add a category seems like a very worthwhile goal. Minimal would be to have getCategories and allow userland implementations of such a mechanism. Fancier would be addCategories()/removeCategories(), fanciest would be subscriptionToken = addCategories()/removeCategories(subscriptionToken) to cleanly allow overlapping subscriptions.

@AndreasMadsen
Copy link
Member

If we're bikeshedding API: there should be a way to additively enable or disable categories in order for disparate parts of the application to tweak the tracer without stepping on each other's foot.

Yes. This was essentially what I wanted. SetCategories is just the internal trace_events method that should be used. And not StartAgent and StopAgent. The agent can simply be stopped when there are no categories left.

@jasnell
Copy link
Member Author

jasnell commented Feb 20, 2018

without stepping on each other's foot

Just so I'm clear... still writing out to a single target yes?

The flow I have in mind is this:

  1. If --trace-events-enabled or --trace-event-categories is used, the trace agent is launched and cannot be turned off by the process. The categories may still be extended but cannot shrink below the initial set specified by --trace-event-categories.

  2. If tracing is not turned on using the command line flags, calling trace_events.setCategories('...') will start the agent if it is not yet started (assuming the categories is not undefined|null|empty-string), then use SetCategories().

  3. If SetCategories() is undefined|null|empty-string, and tracing is not turned on using the command line flags, stop the tracing agent. This will not free the tracing agent however, so it may be restarted to keep appending to the same trace file.

  4. Optionally we can add a trace_events.stopAgent() that would stop and actually free the agent if the command-line flags are not used.

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Feb 20, 2018

My thoughts:

  • --trace-events-enabled is removed. Categories must be enabled explicitly and when enabled the TracingAgent starts. It is not possible to enable all categories (read: when we eventually allow custom categories) anyway:

https://github.com/nodejs/node/blob/1e57a8d1171558774c849e7a8e5d8b91b003ffab/src/tracing/agent.cc#L37L39

  • --trace-event-categories adds the categories on startup, allowing to catch trace_events before JS is invoked.
  • trace_events.addCategory(category) adds a category dynamically
  • trace_events.removeCategory(category) removes a category dynamically
  • trace_events.hasCategory(category) checks if category is added

Motivation:

  • The forced behavior of --trace-event-categories seems unnecessarily complicated.
  • Allowing individual add and remove of a category, makes interoperability at least a bit feasible.
  • Not exposing StopAgent and StartAgent directly, also makes interoperability at least a bit feasible.

Final thoughts:

I think it would be better to think of user-defined tracing points (UDTP) as a whole before pushing a very trace_event specific API.

Making this a semver-major is also fine in my mind. The relevancy of this API doesn't really show until we have custom user-defined tracing points (UDTP). Or at least a unified tracing point strategy and that will likely be a semver-major anyway.

@mike-kaufman
Copy link

A few comments on proposed API:

  • It's a little odd that a command-line switch would change behavior of programmatic API. Why is this?

  • I would expect symmetry between Start_Agent(...) & Stop_Agent(...). Proposal (in some iterations above) has Start_Agent(...) accepting a list of categories and Stop_Agent accepting no args.

  • If tracing is not turned on using the command line flags, calling trace_events.setCategories('...') will start the agent

I think a more explicit API has clear start/stop semantics (i.e., a global toggle for turning on/off tracing irrespective of any enabled categories), and enable/disable category semantics (i.e., a way to set the filters on which categories are enabled, no impact on global toggle).

@BridgeAR
Copy link
Member

BridgeAR commented Mar 6, 2018

Ping @jasnell

@jasnell
Copy link
Member Author

jasnell commented Mar 6, 2018

@BridgeAR ... still a WIP. Please leave this open.

@jasnell
Copy link
Member Author

jasnell commented Mar 8, 2018

Replaced by #19233

@jasnell jasnell closed this Mar 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. discuss Issues opened for discussions and feedbacks. process Issues and PRs related to the process subsystem. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants