Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trace_events: when enabling via inspector after startup, process abort #23185

Closed
vmarchaud opened this issue Sep 30, 2018 · 10 comments · Fixed by #24814
Closed

trace_events: when enabling via inspector after startup, process abort #23185

vmarchaud opened this issue Sep 30, 2018 · 10 comments · Fixed by #24814
Assignees
Labels
trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.

Comments

@vmarchaud
Copy link
Contributor

  • Version: 10.10.0
  • Platform: Linux 4.4.0-122-generic #146-Ubuntu SMP Mon Apr 23 15:34:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: trace_events + NodeTracing Inspector Domain

I'm playing around with the new NodeTracing domain (cf nodejs/diagnostics#220), trying to collect trace events at runtime in our APM (implementation is still wip so don't mind the code).

Everything was working great in test but when i deployed in our staging environment (and putting a some load with wrk-node), i apparently hit a race condition that cause that make the process abort : https://github.com/nodejs/node/blob/master/deps/v8/src/api.cc#L1059
I only get the debug line outputted by V8 before the abort happen, so not so much information here.

I can reproduce almost every time with our API, i assume i will be able to make a sample so everyone can reproduce on their own, in the mean time i'm already open the issue to discuss it.
I can recompile node with any patch if needed and test directly on my staging env.

@eugeneo if you have any idea, i'm able to put some hours into debugging it the coming week

@vmarchaud
Copy link
Contributor Author

@ofrobots Do you have any idea where i should look at to find the root cause of this bug ?

@thefourtheye
Copy link
Contributor

cc @nodejs/diagnostics

@mmarchini
Copy link
Contributor

@vmarchaud if you could share a sample code without 3rd party libraries to reproduce this failure, or a core dump (if there's no sensitive information in your application) / node-report, I can take a look into it.

@Trott
Copy link
Member

Trott commented Nov 10, 2018

@vmarchaud Are you still experiencing this problem?

@vmarchaud
Copy link
Contributor Author

@Trott I believe its still a problem but i need to verify against the latest 10.x or 11.x version.

@targos targos added the trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events. label Nov 11, 2018
@vmarchaud
Copy link
Contributor Author

I'm able to reproduce the crash with 10.13.0, i have a core dump that i can send you @mmarchini , do you have any email i can contact you ?
I will try to make a simple example that reproduce the problem

@vmarchaud
Copy link
Contributor Author

vmarchaud commented Nov 15, 2018

And there is a sample that reproduce the problem for me :

const http = require('http')
const port = 3000
const inspector = require('inspector')
const fs = require('fs')
const session = new inspector.Session()
session.connect()

const opts = {
  includedCategories: [
    'node'
  ],
  recordContinuously: false
}

const requestHandler = (request, response) => {
  response.end('Hello Node.js Server!')
}

const server = http.createServer(requestHandler)

server.listen(port, (err) => {
  if (err) {
    return console.log('something bad happened', err)
  }

  console.log(`server is listening on ${port}`)
  // launching the tracing
  session.post('NodeTracing.start', {
    traceConfig: opts
  })
  console.log(`Starting NodeTracing`)

  // lets stop after 10 sec
  setTimeout(_ => {
    const events = []
    session.on('NodeTracing.tracingComplete', () => {
      console.log(`Finished with success`)
      const data = events.reduce((agg, events) => {
        agg = agg.concat(events.value)
        return agg
      }, [])
      fs.writeFileSync('./trace.json', JSON.stringify(data))
      console.log(`Saved file with trace events`)
      return process.exit(0)
    })
    session.on('NodeTracing.dataCollected', (event) => {
      events.push(event.data)
    })
    console.log(`Stopping NodeTracing`)
    session.post('NodeTracing.stop', {
      traceConfig: opts
    })
  }, 5000)
})

You just need to make a few request to the server, for example i'm using wrk-node :

wrk -c 10 -t 3 -d 5 http://localhost:3000

@ofrobots
Copy link
Contributor

I can reproduce the abort with the provided test. Taking a look....

@ofrobots
Copy link
Contributor

(lldb) bt
* thread #2, stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x0000000100cab9f2 node`v8::base::OS::Abort() + 18
    frame #1: 0x00000001000f7714 node`node::inspector::(anonymous namespace)::JSBindingsConnection::JSBindingsSessionDelegate::SendMessageToFrontend(v8_inspector::StringView const&) + 44
    frame #2: 0x00000001000f2491 node`node::inspector::(anonymous namespace)::ChannelImpl::sendMessageToFrontend(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 41
    frame #3: 0x00000001000f2450 node`non-virtual thunk to node::inspector::(anonymous namespace)::ChannelImpl::sendProtocolNotification(std::__1::unique_ptr<node::inspector::protocol::Serializable, std::__1::default_delete<node::inspector::protocol::Serializable> >) + 44
    frame #4: 0x0000000100143bf3 node`node::inspector::protocol::NodeTracing::Frontend::sendRawNotification(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 81
    frame #5: 0x0000000100100e6e node`node::inspector::protocol::(anonymous namespace)::InspectorTraceWriter::Flush(bool) + 550
    frame #6: 0x00000001000dfeba node`node::tracing::Agent::Flush(bool) + 116
    frame #7: 0x00000001000e176a node`node::tracing::NodeTraceBuffer::NonBlockingFlushSignalCb(uv_async_s*) + 66
    frame #8: 0x0000000100a000a3 node`uv__async_io + 318
    frame #9: 0x0000000100a0fa78 node`uv__io_poll + 1739
    frame #10: 0x0000000100a0052a node`uv_run + 336

@ofrobots
Copy link
Contributor

We crash because we are trying to access the main thread isolate on the tracing agent thread. The main thread also happens to be active and has the isolate lock. @eugeneo would you have time to look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
trace_events Issues and PRs related to V8, Node.js core, and userspace code trace events.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants