Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(azure-functions): support Azure Functions programming model v4 #4426

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

trentm
Copy link
Member

@trentm trentm commented Jan 16, 2025

See https://learn.microsoft.com/en-ca/azure/azure-functions/functions-node-upgrade-v4
This builds on the work in #4178
by @qzxlkj (type that 5 times fast), which provides most of the runtime code fix.

The rest of this is adding testing and updated examples and docs.

Refs: #4178
Closes: #3185

checklist

  • example update
  • docs update
  • changelog

See https://learn.microsoft.com/en-ca/azure/azure-functions/functions-node-upgrade-v4
This builds on the work in #4178
by @qzxlkj (type that 5 times fast), which provide most of the runtime code fix.

The rest of this is adding testing and updated examples and docs.

Refs: #4178
Closes: #3185
@trentm trentm self-assigned this Jan 16, 2025
@trentm
Copy link
Member Author

trentm commented Jan 16, 2025

tl;dr: Distributed tracing for incoming requests to Azure Functions using their Node.js programming model v4 does not work. This is due to a bug in Azure Functions that results in InvocationContext.traceContext not being usable.
(This instrumentation is choosing Option B, described below.)

distributed tracing issues with Azure Functions

Azure Functions currently has behaviour that breaks distributed tracing. The following describes the behaviour with no telemetryMode setting in the Azure Function application's "host.json" file.

Instrumentation of Azure Functions is done via the invocation hooks mechanism. The InvocationContext passed to the preInvocation hook includes a traceContext with W3C trace-context data. Its behaviour is:

  1. If the incoming request (e.g. an HTTP request for an HTTP-triggered function) includes trace-context, e.g.:

    traceparent: 00-12345678901234567890123456789012-1234567890123456-01
    

    then Azure internally creates a child span resulting in

    traceContext.traceParent = '00-12345678901234567890123456789012-48909501584ddaed-01'
    

    Notice that the parent-id has changed. The internal span (id=48909501584ddaed) cannot be collected, so the result is a broken trace.

  2. If the incoming request does not have trace-context, then Azure internally creates a span in a new trace, something like:

    traceContext.traceParent = '00-deed8e23efe326408bca77616654d445-ea4f834079f1596f-00'
    

    Notice that the sampled flag is false. This results in the whole trace being sampled out.

See the following for this issue having been reported:
Azure/azure-functions-dotnet-worker#2733 (comment)
Azure/azure-functions-host#10641

options

To cope with this at least partly-unusable trace-context:

First, this instrumentation prefers to directly use the trace-context from the incoming HTTP headers, if any.
HTTP-triggered Azure Functions have a request object with the headers.
However the issue is whether those headers are available to the preInvocation hook.
In the now-old v3 Node.js programming model the InvocationContext includes the HTTP request object, so the headers are accessible.
In the v4 Node.js programming model this is no longer the case.

(Dev Note: Future work could possibly monkey-patch app.http from @azure/functions to access the incoming request headers. This would likely mean changing away from using the hooks mechanism at all for instrumentation.)

Option A: use invocationContext.traceContext as a fallback

If we use invocationContext.traceContext used as a fallback, then we need to contend with the two issues above.

There is no coping with issue 1, AFAIK. (It is possible with Azure Functions' current and coming OTel integration that this internal span would be accessible for ingest.)

We could cope with issue 2 by strongly encouraging the usage of the traceContinuationStrategy: 'restart_external' APM agent config (doc link). The result is a still a broken distributed trace (because of issue 1: the missing internal span), but at least a sane sampling decision will be made.

The Elastic APM UI trace view shows:

  • a span link for the ignore internal span context; and
  • shows a Incomplete trace warning about the distributed trace being incomplete.

Screenshot 2025-01-16 at 11 47 20 AM

Option B: don't use invocationContext.traceContext

The experience with Option A is still poor for the user. It is clearer to just not use the often completely- and always partial-broken traceContext and document for the user that distributed tracing incoming to Azure Functions just doesn't work because of the issues linked to above.

@trentm
Copy link
Member Author

trentm commented Jan 16, 2025

For the record, though we are not using it, this is how Option A above is implemented:

% diff -u azure-functions.js azure-functions.js.option-a
--- azure-functions.js	2025-01-16 13:31:47
+++ azure-functions.js.option-a	2025-01-16 13:31:34
@@ -423,6 +423,14 @@
       log.trace(
         { traceparent, tracestate },
         'azure-functions: get trace-context from HTTP trigger request headers',
+      );
+    }
+    if (!traceparent && context?.traceContext?.traceParent) {
+      traceparent = context.traceContext.traceParent;
+      tracestate = context.traceContext.traceState;
+      log.trace(
+        { traceparent, tracestate },
+        'azure-functions: get trace-context from invocationContext.traceContext',
       );
     }

@trentm
Copy link
Member Author

trentm commented Jan 16, 2025

Also for the posterity, this is what the azfunc example looked like when the "Hi" function called the "Bye" function before returning:

const { app } = require('@azure/functions');

app.http('Hi', {
  methods: ['GET'],
  authLevel: 'anonymous',
  handler: async (request, _context) => {
    // Call `GET /api/Bye` on this same Azure Function.
    const url = new URL(request.url);
    url.pathname = '/api/Bye';
    const byeRes = await fetch(url);
    const byeBody = await byeRes.json();

    const body = JSON.stringify({
      hi: 'there',
      'from /api/Bye call': byeBody
    });
    return {
      status: 200,
      headers: {
        'Content-Type': 'application/json',
        'Content-Length': Buffer.byteLength(body)
      },
      body
    };
  },
});

@trentm
Copy link
Member Author

trentm commented Jan 16, 2025

Some screenshots of tracing data from requests to the example app deployed to Azure:

% curl -i https://trentm-example-azure-function-app.azurewebsites.net/api/hello
HTTP/2 200
content-type: application/json
date: Thu, 16 Jan 2025 21:56:07 GMT
content-length: 80
request-context: appId=cid-v1:333aa370-882c-4ebf-b6fb-a2e8cc533807

{"hello":"world","current time in Vancouver":"2025-01-16T13:56:07.819933-08:00"}

Screenshot 2025-01-16 at 1 55 34 PM

% curl -i https://trentm-example-azure-function-app.azurewebsites.net/api/hello
HTTP/2 500
date: Thu, 16 Jan 2025 21:56:13 GMT
content-length: 0
request-context: appId=cid-v1:333aa370-882c-4ebf-b6fb-a2e8cc533807

Here the HTTP request made inside the Azure Function timed out.

Screenshot 2025-01-16 at 2 01 03 PM

@trentm trentm marked this pull request as ready for review January 16, 2025 22:22
if (triggerType === TRIGGER_HTTP && context.req && context.req.headers) {
traceparent =
context.req.headers.traceparent ||
context.req.headers['elastic-apm-traceparent'];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: I'm also dropping support for the now long-ancient 'elastic-apm-traceparent' header. This is obsolete.

functionName: context.functionName,
};
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Credit: this block is directly from #4178
This is the bulk of the fix for supporting Azure Functions v4

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: This is a direct copy of some of the functions that were in azure-functions.test.js before it was broken into two separate test files for v3 and v4.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: Testing azure functions involves npm installing the ridiculously large azure-functions-core-tools dep. To limit the insanity I've limited testing of v3 to Node.js 18 and v4 to Node.js 20.

@@ -168,7 +52,7 @@ function checkExpectedApmEvents(t, apmEvents) {
if (apmEvents.length > 0) {
const metadata = apmEvents.shift().metadata;
t.ok(metadata, 'metadata is first event');
t.equal(metadata.service.name, 'AJsAzureFnApp', 'metadata.service.name');
t.equal(metadata.service.name, 'azfunc3', 'metadata.service.name');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: This and all the remaining diffs in this file are from renaming the fixture: s/AJsAzureFnApp/azfunc3/.

Copy link
Member Author

@trentm trentm Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewer note: This file is an almost exact copy of azure-functions-v3.test.js, except it tests the separate fixtures/azfunc4, and it doesn't test as many scenarios/edge cases because I didn't think that is worth a day or two of added test writing at this point.

There could likely be more code sharing between these two files, but again, I didn't want to blow too much time perfecting this.

@trentm trentm mentioned this pull request Jan 16, 2025
6 tasks
@trentm trentm requested a review from david-luna January 17, 2025 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

support Azure Functions Node.js programming model v4
1 participant