Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace_events: enable trace events implicitly, add module #19233

Closed
wants to merge 6 commits into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Mar 8, 2018

Ping @AndreasMadsen, @addaleax, @mcollina

This is a replacement for #18826 that takes a different approach.

Enable trace events to be turned on if --trace-event-categories
is set even if --trace-events-enabled is not used. Using
--trace-events-enabled will still work and will still set the
default categories to v8,node,node.async_hooks, but the boolean
flag is not required to enable the tracing agent.

Also, add trace_events module and API

Allows trace event collection to be turned on/off at runtime
via a new top level trace_events module.

Semver-major because of the addition of the new top level module.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@jasnell jasnell added semver-major PRs that contain breaking changes and should be released in the next major version. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. labels Mar 8, 2018
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Mar 8, 2018
@jasnell
Copy link
Member Author

jasnell commented Mar 8, 2018

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test that enable/disable the trace_events and process them when the target process is still running? Ideally from the same process.

const path = require('path');
const fs = require('fs');
const tmpdir = require('../common/tmpdir');

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a small comment explaining what this test does? are we validating trace_events with perf_hooks? I would prefer to have an independent test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using the fact that perf_hooks emits a very small number of very predictable trace events. The v8 and node.async_hooks trace events tend to be more unpredictable.

@@ -23,15 +20,17 @@ The available categories are:
measurements.
* `v8`

By default the `node`, `node.async_hooks`, and `v8` categories are enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since --trace-events-enabled does not enable all categories we should still document this default list for that option.


### trace_events.setTracingCategories(categories)

* `categories` [string] A comma-separated list of category names. If
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[string] -> {string} ?


### trace_events.getTracingCategories()

* Return: [string]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return: -> Returns:?
[string] -> {string} ?


It is accessible using `require('trace_events')`.

### trace_events.setTracingCategories(categories)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if categories can be undefined, should this be ([categories])?

@AndreasMadsen
Copy link
Member

As I have argued before, I think it would be better to just completely remove --trace-events-enabled, as can only enable the categories we are aware of automatically. And eventually, when correctly implement the node category, the only categories there will exist by default are node and v8.

@@ -5,12 +5,9 @@
Trace Event provides a mechanism to centralize tracing information generated by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't have to be in this PR, but we really need to rename this file to trace_events.md and give it a correct title :D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely :-) ... I can mark that as a todo for another PR.


It is accessible using `require('trace_events')`.

### trace_events.setTracingCategories(categories)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still in favor of an API that allows one to enable and disable categories individually, as that will make it easier to have different modules enable categories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was going back and forth on that. It wouldn't be too difficult to do I just wasn't sure just how far to go down that road so opted for the easiest thing initially.

What I had in mind was a simple variation on this such as:

// trace.events.enableTracingCategories(...categories)
// trace.events.disableTracingCategories(...categories)
trace_events.enableTracingCategories('node', 'v8', 'foo')
trace_events.disableTracingCategories('node')

trace_events.getTracingCategories() // Returns an array, e,g, ['v8', 'foo']

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was something like that I had in mind.

@jasnell
Copy link
Member Author

jasnell commented Mar 8, 2018

@AndreasMadsen ... re: --treace-events-enbled ... yes, I'm leaning towards removing, but I think we should take it through a deprecation cycle. Not sure if we really need to tho. @nodejs/tsc ... what do you think?

src/node.cc Outdated
static bool trace_enabled = false;
static std::string trace_enabled_categories; // NOLINT(runtime/string)

std::string trace_enabled_categories; // NOLINT(runtime/string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lint exception really needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, yes. Not entirely sure why but it complains without it.

Copy link
Member

@fhinkel fhinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ part lgtm

@AndreasMadsen
Copy link
Member

@jasnell In async_hooks we have a backported deprecation warnings when possible and removed it in the next semver-major version. I this case we could deprecate when --trace-events-enabled is used without --trace-event-categories.

getTracingCategories
} = process.binding('trace_events');

const { trace_events_enabled } = process.binding('config');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: If we can rename trace_events_enabled to something which is not in snake case here, it would be better.

* Return: [string]

Returns the current list of category names as a comma-separated list or
`undefined`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return undefined? Wouldn't an empty array be better here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer not to do a useless allocation.

@jasnell jasnell force-pushed the trace_agent_refactor branch from 9024393 to bab4b0c Compare March 9, 2018 19:57
@jasnell
Copy link
Member Author

jasnell commented Mar 9, 2018

@AndreasMadsen ... ok, reworked to use the enableTracingCategories/disableTracingCategories API. :-)

@thefourtheye ... getTracingCategories() now returns an array :-)

@fhinkel ... C++ part changed quite a bit :-)


### trace_events.disableTracingCategories(...categories)

* `...categories` [string] One or more category names.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[string] -> {string} (otherwise this will not be linkified by the type parser).


### trace_events.enableTracingCategories(...categories)

* `...categories` [string] One or more category names.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[string] -> {string}


### trace_events.getTracingCategories()

* Return: {string[]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return: -> Returns:

@jasnell
Copy link
Member Author

jasnell commented Mar 12, 2018

Ping @addaleax, @AndreasMadsen, @thefourtheye, @fhinkel :-)

<a id="ERR_TRACE_EVENTS_UNAVAILABLE"></a>
### ERR_TRACE_EVENTS_UNAVAILABLE

The trace events mechanism is not available for use in the Node.js binary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding:

This is the case when Node.js was compiled with the `--without-v8-platform` flag.

extern void DisableTracingCategories(
const std::vector<std::string>& categories);
extern const std::set<std::string>& GetEnabledTracingCategories();
extern void StopTracing();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can drop extern, it doesn’t make a difference for function declarations + we don’t use it elsewhere

src/node.cc Outdated
tracing_agent_->EnableCategories(categories);
}

void DisableTracingCategories(const std::vector<std::string> categories) {
Copy link
Member

@addaleax addaleax Mar 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should use const references

static void GetTracingCategories(const FunctionCallbackInfo<Value>& args) {
#if NODE_USE_V8_PLATFORM
Environment* env = Environment::GetCurrent(args);
auto categories = GetEnabledTracingCategories();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would create a copy of the set, so const auto& would be better.

auto categories = GetEnabledTracingCategories();
Local<Array> ret = Array::New(env->isolate(), categories.size());
size_t n = 0;
for (auto it = categories.begin(); it != categories.end(); it++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (const std::string& category : categories)? If it fits into one line you can even replace categories with GetEnabledTracingCategories() in that scenario

started_ = true;
}
TraceConfig* trace_config = new TraceConfig();
for (auto it = categories_.begin(); it != categories_.end(); it++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto – for (const std::string& category : categories) would read better, imo.

category != categories.end();
category++) {
categories_.insert(*category);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the loop and just write categories_.insert(categories.begin(), categories.end());

category++) {
categories_.erase(*category);
}
StartTracing();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be if (categories_.empty()) StopTracing(); somewhere?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because StartTracing() already handles it.

if (!categories.empty()) {
for (auto category = categories.begin();
category != categories.end();
category++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, for (const std::string& category : categories)

void StartTracing();

const std::string& log_file_pattern_;
std::set<std::string> categories_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this doesn’t make a huge difference, but why not std::unordered_set?

jasnell added 4 commits March 12, 2018 11:27
Enable trace events to be turned on if `--trace-event-categories`
is set even if `--trace-events-enabled` is not used. Using
`--trace-events-enabled` will still work and will still set the
default categories to `v8,node,node.async_hooks`, but the boolean
flag is not required to enable the tracing agent.

Also, add trace_events module and API

Allows trace event collection to be turned on/off at runtime
via a new top level `trace_events` module.
`trace_events.enableTracingCategories('v8', 'node')`
`trace_events.disableTracingCategories('v8')`
`trace_events.getTracingCategories()` // return an array
@jasnell jasnell force-pushed the trace_agent_refactor branch from 049be5a to d366cc6 Compare March 12, 2018 18:27
@jasnell
Copy link
Member Author

jasnell commented Mar 12, 2018

@ofrobots
Copy link
Contributor

@jasnell One use case I have been thinking about is having the ability to enable/disable tracing on a specific operations (say an individual http request). I think this is mostly compatible with the API being proposed here with some extra features:

  • Ability to programmatically control the output file from JS. This is needed if the user wants to do 'sampling' and record traces into different files for each 'sample'.
  • We may also need the ability to disable/enable all categories, effectively enabling/disabling tracing.

Of course, the latter assumes that there is only one actor in the system that wants to control tracing. This is a different use-case than ambient, always-on, tracing. It is not clear to me how users would actually use the tracing system in Node, but I think we should support both use-cases. WDYT?

@jasnell
Copy link
Member Author

jasnell commented Mar 15, 2018

Controlling the output destination would require a deeper internal change to the agent to essentially support multiple simultaneous targets unless we're going to limit the API to one Target at a time. That could run into issues if dependencies attempt to enable/disable their own tracing targets in conflict with one another. Not out of the question but also not trivial.

Disabling all categories using this api is trivial. Enabling all assumes we know what all can be enabled. For now we can do that by enabling the v8 and node categories.

@ofrobots
Copy link
Contributor

Controlling the output destination would require a deeper internal change to the agent to essentially support multiple simultaneous targets unless we're going to limit the API to one Target at a time.

I was thinking a single target at a time. It seems that there are two distinct use-cases: ambient mode (multiple actors may enable / disable categories; traces are written and consumed at end), or online mode (where an actor is directly controlling when tracing is enabled, where the output is written). If we deem the use-cases to be exclusive, the single target option would work. IOW, one cannot do ambient tracing and online tracing at the same time.

@jasnell
Copy link
Member Author

jasnell commented Mar 16, 2018

@ofrobots ... yeah, I guess the part I'm struggling with are the API ergonomics on that. Do you have any rough sketch ideas on what the API would look like?

@AndreasMadsen ... what do you think?

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Mar 16, 2018

@jasnell I was thinking of something like:

const tracer = trace_events.enableCategories('node.http', 'node.dns')
tracer.disableCategories();

Then, if at least one agent has the category enabled it is enabled. The API also prevents an actor from disabling another actors category.

The next question is then how to deal with multiple listeners. I think this should be separated into two:

  • out-of-process, only one actor can listen this way. That actor will be the one that starts the node process for example, APM or clinic. If they wish to integrate with other out-of-process actors it is their responsibility to relay the information.

  • in-process actors, could use an API like:

const tracer = new trace_events.Tracer('node.http', 'node.dns')
tracer.on('event', console.log);
tracer.disable();

Properly that API could be the same as the API for dynamically enabling and disabling a category. If an in-process actor enables a category that category will leak to the out-of-process actor. It is the out-of-process actors responsibility to filter the data.

I think the in-process use case should have a low priority. But we should consider making the category API such that it can be extended in the future.

@jasnell
Copy link
Member Author

jasnell commented Mar 16, 2018

An API that somewhat echoes async_hooks would be good here...

const tracer = trace_events.createTracer('node.http', 'node.dns')
trace.enable()
trace.disable()

I'm not as convinced about the in-process consumer, however, as it could easily end up being a significant performance foot-gun. If we do end up with the in-process model, however, let's avoid the EventEmitter approach and go with a simple callback... something like...

const tracer = trace_events.createTracer('node.http', 'node.dns')
tracer.enable((event) => { /* ... */ });
tracer.disable();

My key concern with the in-process model is the conversion of the trace event into a javascript object and the cost of the boundary cross that would be required. If someone is also using async_hooks, there could certainly be a significant performance penalty. Do we care?

@ofrobots
Copy link
Contributor

I would be opposed to an in process consumer. The performance overhead would defeat the benefit of the extremely efficient low-level implementation. I do want to be able to control the output filename being controllable from an in process actor.

@AndreasMadsen
Copy link
Member

The performance overhead would defeat the benefit of the extremely efficient low-level implementation. I do want to be able to control the output filename being controllable from an in process actor.

I agree.

@jasnell
Copy link
Member Author

jasnell commented Mar 16, 2018

Ok, then I think we're all aligned on that then.

The remaining question is then how do we allow for in-process control of the trace filename without running into conflicts with multiple tracers. One thought is to only allow changing of the file name while trace events are not being actively captured... that is, while there are no active categories. Any attempt to set the filename while tracers are active would result in an error.

trace_events.setTraceFile(filenameOrFD)  // ok because there are no active tracers yet.
const tracer = trace_events.createTracer('node.http', 'node.dns')
trace_events.setTraceFile(filenameOrFD)  // still ok, because tracer is not enabled yet.
tracer.enable()
trace_events.setTraceFile(filenameOrFD)  // throws! tracing is active
tracer.disable()
trace_events.setTraceFile(filenameOrFD)  // ok, no tracer running

A default tracer can be created when the --trace-event-categories command line flag is set... e.g.

trace_events.getDefaultTracer().disable()  // turn off the tracing categories set using the command line flag
trace_events.setTraceFile(filenameOrFD) 
trace_events.getDefaultTracer().enable()   // re-enable the default tracer

@AndreasMadsen ... Another question: with regards to the trace file pattern work you recently landed (specifically ${rotation})... if all tracers are disabled, should that then close out the current trace file and cause a rotation to a new file once re-enabled?

e.g.

trace_events.setTraceFile('node_trace_${rotation}.log')
const tracer = trace_events.createTracer('node.http2')
tracer.enable()
// do stuff
tracer.disable()  // causes node_trace_1.log to be closed because all tracing is off

tracer.enable()  // causes node_trace_2.log to be opened and started because tracing is back on
// do stuff

@AndreasMadsen
Copy link
Member

AndreasMadsen commented Mar 16, 2018

@jasnell This doesn't really allow for multiple independent agents. I have no idea how it should be implemented, but I think we need something like:

const tracer = trace_events.createTracer({
  categories: ['node.http', 'node.dns'],
  destination: filenameOrFD
})
tracer.enable()
tracer.disable()

--trace-event-categories and --trace-event-file-pattern is then simply a CLI version of:

const tracer = trace_events.createTracer({
  categories: categories.split(','),
  destination: filePattern
})
tracer.enable()

if all tracers are disabled, should that then close out the current trace file and cause a rotation to a new file once re-enabled?

I think it would be better to look into a more streaming friendly file format.

@jasnell
Copy link
Member Author

jasnell commented Mar 16, 2018

Hmm... that increases the complexity of the agent fairly significantly in that we would end up having to output to multiple writers for each destination... or spin up a separate agent thread for each destination and refactor the queue. Will think about that a bit more.

I think it would be better to look into a more streaming friendly file format

Yeah, I've started some investigations into that with @ofrobots' help was able to determine that there's a little bit of flexibility in the v8 json format but what would be ideal is a true line-delimited json approach that could write to a file or to a pipe or any arbitrary fd. I'll be working up an experimental alternative writer that does exactly that in the coming couple of weeks once I clear a few higher priority items off my list.

@AndreasMadsen
Copy link
Member

help was able to determine that there's a little bit of flexibility in the v8 json format

Yeah, I know chrome://tracing supports a flat array, which is at least a little easier to parse. I guess we could some fusion format, that is both line-delimited but also a valid JSON file. Like:

[{ cat: 'node.dns' }
,{ cat: 'node.dns' }
]

This could be streaming parsed by splitting at new-lines and skipping the first character. I don't like it, but at least it is possible.

@jasnell jasnell added the wip Issues and PRs that are still a work in progress. label Apr 2, 2018
@jasnell
Copy link
Member Author

jasnell commented Apr 3, 2018

After talking it over with @ofrobots a bit more today, it would be good to split this into three distinct efforts

  1. Support writing the event stream to a pipe destination as well as a file. @ofrobots will be leading looking into this.

  2. Support the API for enabling/disabling clients dynamically at runtime. I will be leading looking into this.

  3. Support for multiple destinations. After discussing it, neither @ofrobots nor I consider this one a priority at this time so we won't be pursuing it but the API should not preclude it if someone wants to add this in later on.

On the JS side, the API I have in mind for 2 is a slight variation of what was last discussed above:

const tracer = trace_events.createTracer({
  categories: ['node.http', 'node.dns']
})
tracer.enable()
tracer.disable()

With this pattern, if someone wanted to support multiple destinations, a new option can be added to the options argument to createTracer().

Some details on the proposed implementation:

  1. The tracer object above is a JS wrapper around a C++ object, the set of active categories is the union of the set of categories for each enabled tracer object.

  2. When an enabled tracer object is gc'd it will be disabled and the new set of enabled categories will be calculated.

  3. A tracer.categories property will return the listing of categories for this tracer, whereas a tracer.allCategories will return the listing of all categories currently enabled.

@jasnell
Copy link
Member Author

jasnell commented Apr 4, 2018

Closing this in favor of a new fresh take...#19803

@jasnell jasnell closed this Apr 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. semver-major PRs that contain breaking changes and should be released in the next major version. trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants