Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transaction.context.request.url.* is wrong for an incoming HTTP request with a pathname starting with a double-slash #3137

Open
trentm opened this issue Feb 1, 2023 · 3 comments
Labels
agent-nodejs Make available for APM Agents project planning. bug

Comments

@trentm
Copy link
Member

trentm commented Feb 1, 2023

transaction.context.request.url.* is wrong for an incoming HTTP request with a pathname starting with a double-slash. E.g.:

GET //foo/bar HTTP/1.1
Host: example.com

repro

Apply this patch:

diff --git a/examples/trace-http.js b/examples/trace-http.js
index f9fbb524..31e5840b 100755
--- a/examples/trace-http.js
+++ b/examples/trace-http.js
@@ -56,7 +56,12 @@ server.listen(3000, function () {
   //
   // Note that this there is no current "transaction" here, so this HTTP
   // request is not captured by APM. See "trace-http-request.js" for more.
-  const clientReq = http.request('http://localhost:3000/', function (clientRes) {
+  // const clientReq = http.request('http://localhost:3000/', function (clientRes) {
+  const clientReq = http.request({
+    host: 'localhost',
+    port: 3000,
+    path: '//user@foo'
+  }, function (clientRes) {
     console.log('client response: %s %s', clientRes.statusCode, clientRes.headers)
     const chunks = []
     clientRes.on('data', function (chunk) {
diff --git a/lib/parsers.js b/lib/parsers.js
index 377ba50d..82ed1800 100644
--- a/lib/parsers.js
+++ b/lib/parsers.js
@@ -27,6 +27,7 @@ function getContextFromRequest (req, conf, type) {
     url: getUrlFromRequest(req),
     headers: undefined
   }
+  console.log('XXX context.url', context.url)
   if (req.socket && req.socket.remoteAddress) {
     context.socket = {
       remote_address: req.socket.remoteAddress

Then run: node examples/trace-http.js.

The generated transaction includes:

    {
        "transaction": {
            "name": "GET //user@foo",
...
                "request": {
                    "http_version": "1.1",
                    "method": "GET",
                    "url": {
                        "raw": "//user@foo",
                        "protocol": "http:",
                        "hostname": "foo",
                        "port": "3000",
                        "full": "http://foo:3000"
                    },
                    "headers": {
                        "host": "localhost:3000",
                        "connection": "close"
                    },
                    "socket": {
                        "remote_address": "::ffff:127.0.0.1"
                    }
                },

details

The recent #3133 fixed a similar issue in the handling of transaction.name for this case. The issue there was that the request path, e.g. //foo/bar, was used as a full URL in URL parsing (via new URL(...) or similar). However, this req.url is the "request-target" in the HTTP/1.1 request line and not a full URL.

The transaction.context.request.url.* value is calculated using the original-url module here:

url: getUrlFromRequest(req),

@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Feb 1, 2023
@david-luna
Copy link
Member

david-luna commented Feb 12, 2023

Root cause seems to be the same as it was in #3133

In this situation getUrlFromRequest is an exported method from a package named origina-url. We can see its import here.

The package original-url uses the legacy API url.parse which is not recommended in the documentation as it says

url.parse() uses a lenient, non-standard algorithm for parsing URL strings. It is prone to security issues such as host name spoofing and incorrect handling of usernames and passwords. Do not use with untrusted input. CVEs are not issued for url.parse() vulnerabilities. Use the WHATWG URL API instead.

The package is using it to create the base URL object for the result. You can see it at the top of the index.js file. Here's in an extract of it

const parseUrl = require('url').parse  // Here it's requiring the legacy function
const parseForwarded = require('forwarded-parse')
const net = require('net')

module.exports = function (req) {
  const raw = req.originalUrl || req.url
  const url = parseUrl(raw || '')   // <-- Here we parse the url string
  const secure = req.secure || (req.connection && req.connection.encrypted)
  const result = { raw: raw }
  let host

What would be ideal is to have the dependency updated. But seeing the commit history it seems unmaintained. Maybe @watson may help us on this or let us update the package.

@trentm
Copy link
Member Author

trentm commented Feb 13, 2023

@david-luna: @\watson will very likely be receptive to a PR on original-url for this if you wanted to work on one. @\watson used to be the primary developer of this elastic-apm-node package and, I think, original-url was initial developed for this package.

@watson
Copy link
Contributor

watson commented Feb 14, 2023

@david-luna yes, please if you don't mind to open a PR against original-url, then I'll take a look ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning. bug
Projects
None yet
Development

No branches or pull requests

3 participants