RUMM-3320 [V1] Fix APM local spans not correlating with RUM views #1355

plousada · 2023-06-30T16:45:25Z

What and why?

APM <-> RUM correlation was not working for local traces due to mismatch in span tag format between the SDK and APM Backend.

fix: #1292

How?

This PR removes the direct use of RUMContextAttributes in both Log and Span encoding.
Instead, a mapping layer is added that transforms keys into the required format for each product.

On the Example app, changes are made to allow enabling a simple debugging option. It's now possible to single a single log, a single span, and a single traced request on a particular RUM session.

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference
Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

Run unit tests
Run integration tests
Run smoke tests

datadog-datadog-prod-us1 · 2023-06-30T16:52:56Z

Datadog Report

Branch report: ncreated/RUMM-3320/GH-1292-fix-apm-with-rum
Commit report: 1013519

✅ dd-sdk-ios: 0 Failed, 0 New Flaky, 116 Passed, 0 Skipped, 3m 10.11s Wall Time

plousada · 2023-07-03T08:40:58Z

Tests/DatadogTests/Datadog/LoggerTests.swift

@@ -562,7 +562,7 @@ class LoggerTests: XCTestCase {

        logMatchers.forEach {
            $0.assertValue(
-                forKeyPath: RUMContextAttributes.IDs.sessionID,
+                forKeyPath: "application_id",


This is currently enforcing the schema used by logs backend.
I'm wondering if it would be preferable to use the output of the mapping function mapInternalAttributeKey here instead of the actual expected key and leave that enforce to E2E tests.

Sources/Datadog/TracerConfiguration.swift

Tests/DatadogTests/Datadog/WebView/WKUserContentController+DatadogTests.swift

fuzzybinary · 2023-07-03T14:12:25Z

Sources/Datadog/Tracing/TracingV2Configuration.swift

@@ -72,7 +72,7 @@ internal struct TracingMessageReceiver: FeatureMessageReceiver {
    ///
    /// - Parameter context: The updated core context.
    private func update(context: DatadogContext) -> Bool {
-        if let attributes: [String: String] = context.featuresAttributes["rum"]?.ids {
+        if let attributes: [String: String?] = context.featuresAttributes["rum"]?.ids {


I'm curious why this change was needed?

Because user_action.id is an optional attribute, we wouldn't be able to read other context if casting to non-optional [String: String]. There was a bug where this context was read by tracer:

[ "rum": [ "application.id": "...", "session.id": "...", "view.id": "...", "user_action.id": "..." ] ]

but this wasn't:

[ "rum": [ "application.id": "...", "session.id": "...", "view.id": "..." ] ]

This surfaces the fragility of feature's context propagation in V2. Each feature sets this dictionary on one end and some features expect it on the other end. The communication channel is [String: Any] dictionary held by core.

ncreated · 2023-07-05T06:55:48Z

Sources/Datadog/Tracing/TracingV2Configuration.swift

@@ -72,7 +72,7 @@ internal struct TracingMessageReceiver: FeatureMessageReceiver {
    ///
    /// - Parameter context: The updated core context.
    private func update(context: DatadogContext) -> Bool {
-        if let attributes: [String: String] = context.featuresAttributes["rum"]?.ids {
+        if let attributes: [String: String?] = context.featuresAttributes["rum"]?.ids {


Because user_action.id is an optional attribute, we wouldn't be able to read other context if casting to non-optional [String: String]. There was a bug where this context was read by tracer:

[ "rum": [ "application.id": "...", "session.id": "...", "view.id": "...", "user_action.id": "..." ] ]

but this wasn't:

[ "rum": [ "application.id": "...", "session.id": "...", "view.id": "..." ] ]

This surfaces the fragility of feature's context propagation in V2. Each feature sets this dictionary on one end and some features expect it on the other end. The communication channel is [String: Any] dictionary held by core.

ncreated · 2023-07-05T07:46:17Z

Sources/Datadog/RUM/Integrations/RUMContextAttributes.swift

        /// The ID of RUM application (`String`).
-        internal static let applicationID = "application_id"
+        internal static let applicationID = "application.id"

        /// The ID of current RUM session (standard UUID `String`, lowercased).
        /// In case the session is rejected (not sampled), RUM context is set to empty (`[:]`) in core.
-        internal static let sessionID = "session_id"
+        internal static let sessionID = "session.id"


blocker 🚫/ This change break Session Replay, which expects "application_id" and "session_id". With being unable to recognise these new values SR thinks there is no RUM, so it doesn't record anything.

We need to update these values here.

Also please update these values in comments of RUM dependency definition within SR module here - this definition is a central place where SR defines everything it expects or provides to other modules in SDK context.

I bump the need for 2 things:

Integration Tests for SR: it is now possible on feature/v2 - I created REPLAY-1863

Integration Unit Tests: to validate feature's contract in unit tests - @maxep do we have JIRA for this?

Integration Tests for SR: it is now possible on feature/v2

If we base this on feature/v2, we won't be able to release it as part of the next 1.x. So I don't think we should block this on SR integration tests, especially since it was reported by users. We should update the values in SR but we can't add integration tests with just yet.

Integration Unit Tests: to validate feature's contract in unit tests - @maxep do we have JIRA for this?

We don't, currently integration units are part of DatadogTests target. But I will create a ticket to migrate them 👍

ncreated · 2023-07-05T07:50:33Z

Sources/Datadog/Logging/LoggingV2Configuration.swift

+internal func mapInternalAttributeKey(_ originalAttribute: String) -> String {
+    switch originalAttribute {
+    case "application.id":
+        return "application_id"
+    case "session.id":
+        return "session_id"
+    default:
+        return originalAttribute
+    }
+}


suggestion/ Could we declare constants for "application.id" and "session.id" as part of RUMDependency namespace defined in Logging module? Similar to how we define RUMDependency in Session Replay - single place with listing all attributes that features exchange between each other.

cc @maxep

Because we work with V1 code (no modules yet), it would have to be enum LogsRUMDependency (or similar) to scope it properly.

Having constants makes sense, should be defined alongside the message receiver IMO.

Message receiver can be a good place to define "received" attributes. Where to define "provided" attributes?

Looking at mentioned RUMDependency in SR there are two parts:

internal enum RUMDependency { // MARK: Contract from RUM to SR: static let rumBaggageKey = "rum" static let serverTimeOffsetKey = "server_time_offset" static let ids = "ids" // MARK: Contract from SR to RUM (mirror of `SessionReplayDependency` in RUM): static let srBaggageKey = "session-replay" static let hasReplay = "has_replay" }

and I believe it is good and clear enough - single place describing the whole contract.

If we want to separate it, it's OK for me as long as it is clear and consistent across all modules.

and I believe it is good and clear enough - single place describing the whole contract.

Yep, totally. Just to add my 2 cents: I don't think we should enforce any convention across modules on that. Some Features will use/share a lot of context values, some won't. We can have some flexibility on how Features are implemented internally IMHO

ncreated · 2023-07-05T07:51:24Z

Sources/Datadog/Tracing/TracingV2Configuration.swift

+internal func mapInternalTags(_ originalTag: String) -> String {
+    switch originalTag {
+    case "application.id":
+        return "_dd.application.id"
+    case "session.id":
+        return "_dd.session.id"
+    case "view.id":
+        return "_dd.view.id"
+    case "user_action.id":
+        return "_dd.action.id"
+    default:
+        return originalTag
+    }
+}


suggestion/ Similar ask to the one in Logging:

suggestion/ Could we declare constants for "application.id" and "session.id" as part of RUMDependency namespace defined in Tracing module? Similar to how we define RUMDependency in Session Replay - single place with listing all attributes that features exchange between each other.

cc @maxep

If the constants are shared eg "application_id", declaring them each module (Tracing, Session Replay etc) seems error prone. IMO, it should go atleast in Internal module which is shared by all other modules.

Not related to this, but having an API package for each feature makes these things easier to maintain, where there could be RUMApi module with plain structs, APIs and constants (ideal world) which can be taken as dependency by others without all the baggage of RUM.

…correlation

datadog-datadog-prod-us1 · 2023-07-14T16:20:11Z

Datadog Report

Branch report: ncreated/RUMM-3320/GH-1292-fix-apm-with-rum
Commit report: d196315

❄️ dd-sdk-ios: 0 Failed, 3 New Flaky, 2208 Passed, 0 Skipped, 28m 56.95s Wall Time

New Flaky Tests (3)

testRUMStopSessionScenario - RUMStopSessionScenarioTests - Last Failure

Expand for error

 Assertion Failure at RUMStopSessionScenarioTests.swift:121: XCTAssertTrue failed

testGivenLoggingFeatureNotRegistered_whenSendingLogFromSpan_itPrintsWarning - TracerTests - Last Failure

Expand for error

 Assertion Failure at TracerTests.swift:1178: XCTAssertEqual failed: ("nil") is not equal to ("Optional("The log for span \"foo\" will not be send, because the Logging feature is disabled.")")

testGivenTracingInstrumentationEnabled_whenInterceptingURLSessionTasks_itNotifiesStartAndCompletion - URLSessionInterceptorTests - Last Failure

Expand for error

 Assertion Failure at URLSessionInterceptorTests.swift:373: Asynchronous wait failed: Exceeded timeout of 0.5 seconds, with unfulfilled expectations: "Complete task interception".

ncreated

@maxep || @ganeshnj I pushed this topic forward and now it is ready for review.

Only fixing Session Replay compared to previous work (the change here has low impact as we don't provide SR in V1 except private beta branch, but I did it anyway).
Like I mentioned on Slack, this is only minimal change to make it work.
We're fixing V1 here - V2 fix is in RUMM-3320 fix: [V2] Fix APM local spans not correlating with RUM views #1375
This PR is targeting release/1.22.0 branch - I will do release after merge.

CI is green, Bitrise checks are false:

ganeshnj · 2023-07-18T13:02:57Z

Datadog/Example/Debugging/DebugRUMSessionViewController.swift

@@ -39,6 +39,12 @@ private class DebugRUMSessionViewModel: ObservableObject {
    @Published var errorMessage: String = ""
    @Published var resourceKey: String = ""

+    @Published var logMessage: String = ""
+    @Published var spanOperationName: String = ""
+    @Published var instrumentedRequestURL: String = "https://api.shopist.io/checkout.json"


currently it returns

{ "status": 404, "error": "Not Found" }

expected ?

It is POST endpoint (GET always returns 404), but getting some % of errors on it is expected as this is part of demo environment 👍.

Datadog/Example/Debugging/DebugRUMSessionViewController.swift

maxep

Looks great! I left a question.

Sources/Datadog/Tracing/Span/SpanEventEncoder.swift

plousada requested a review from a team as a code owner June 30, 2023 16:45

plousada mentioned this pull request Jun 30, 2023

RUMM-3320 [Draft] Fix broken APM <> RUM link #1301

Closed

6 tasks

plousada changed the title ~~Ncreated/rumm 3320/gh 1292 fix apm with rum~~ RUMM-3320 Fix APM local spans not correlating with RUM views Jun 30, 2023

plousada force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch from 9ca02e8 to 11c68d3 Compare July 3, 2023 08:37

plousada commented Jul 3, 2023

View reviewed changes

fuzzybinary approved these changes Jul 3, 2023

View reviewed changes

plousada force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch 2 times, most recently from cefdd18 to 1013519 Compare July 3, 2023 23:12

ncreated requested changes Jul 5, 2023

View reviewed changes

ncreated self-assigned this Jul 14, 2023

ncreated and others added 3 commits July 14, 2023 17:55

RUMM-3320 Add debug options for testing APM Span <> RUM link in DD app

e2b3712

RUMM-3320 Draft the fix

b97613a

Fix attributes/tags keys sent on Log and Span events that enable RUM …

114a062

…correlation

ncreated force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch from 1013519 to bf28ee9 Compare July 14, 2023 16:10

ncreated changed the base branch from develop to release/1.22.0 July 14, 2023 16:11

ncreated marked this pull request as draft July 14, 2023 16:19

RUMM-3320 Update RUM context keys in Session Replay

ccd6149

ncreated force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch from bf28ee9 to ccd6149 Compare July 14, 2023 16:45

ncreated changed the title ~~RUMM-3320 Fix APM local spans not correlating with RUM views~~ RUMM-3320 [V1] Fix APM local spans not correlating with RUM views Jul 17, 2023

ncreated mentioned this pull request Jul 17, 2023

RUMM-3320 fix: [V2] Fix APM local spans not correlating with RUM views #1375

Merged

6 tasks

ncreated marked this pull request as ready for review July 18, 2023 12:07

ncreated approved these changes Jul 18, 2023

View reviewed changes

ganeshnj approved these changes Jul 18, 2023

View reviewed changes

maxep approved these changes Jul 18, 2023

View reviewed changes

Sources/Datadog/Tracing/Span/SpanEventEncoder.swift Outdated Show resolved Hide resolved

RUMM-3320 CR feedback

d196315

ncreated merged commit 67fefcd into release/1.22.0 Jul 19, 2023

ncreated deleted the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch July 19, 2023 09:56

ncreated mentioned this pull request Jul 20, 2023

Release 1.22.0 #1381

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUMM-3320 [V1] Fix APM local spans not correlating with RUM views #1355

RUMM-3320 [V1] Fix APM local spans not correlating with RUM views #1355

plousada commented Jun 30, 2023 •

edited by ncreated

Loading

datadog-datadog-prod-us1 bot commented Jun 30, 2023 •

edited

Loading

plousada Jul 3, 2023

fuzzybinary Jul 3, 2023

ncreated Jul 5, 2023

ncreated Jul 5, 2023

ncreated Jul 5, 2023

maxep Jul 10, 2023 •

edited

Loading

ncreated Jul 5, 2023

ncreated Jul 5, 2023

maxep Jul 10, 2023

ncreated Jul 14, 2023

maxep Jul 17, 2023

ncreated Jul 5, 2023

ganeshnj Jul 12, 2023

datadog-datadog-prod-us1 bot commented Jul 14, 2023 •

edited

Loading

ncreated left a comment

ganeshnj Jul 18, 2023

ncreated Jul 19, 2023

maxep left a comment

RUMM-3320 [V1] Fix APM local spans not correlating with RUM views #1355

RUMM-3320 [V1] Fix APM local spans not correlating with RUM views #1355

Conversation

plousada commented Jun 30, 2023 • edited by ncreated Loading

What and why?

How?

Review checklist

Custom CI job configuration (optional)

datadog-datadog-prod-us1 bot commented Jun 30, 2023 • edited Loading

Datadog Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxep Jul 10, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datadog-datadog-prod-us1 bot commented Jul 14, 2023 • edited Loading

Datadog Report

New Flaky Tests (3)

ncreated left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxep left a comment

Choose a reason for hiding this comment

plousada commented Jun 30, 2023 •

edited by ncreated

Loading

datadog-datadog-prod-us1 bot commented Jun 30, 2023 •

edited

Loading

maxep Jul 10, 2023 •

edited

Loading

datadog-datadog-prod-us1 bot commented Jul 14, 2023 •

edited

Loading