Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-3320 [V1] Fix APM local spans not correlating with RUM views #1355

Merged
merged 5 commits into from
Jul 19, 2023

Conversation

plousada
Copy link
Member

@plousada plousada commented Jun 30, 2023

What and why?

APM <-> RUM correlation was not working for local traces due to mismatch in span tag format between the SDK and APM Backend.

fix: #1292

How?

This PR removes the direct use of RUMContextAttributes in both Log and Span encoding.
Instead, a mapping layer is added that transforms keys into the required format for each product.

On the Example app, changes are made to allow enabling a simple debugging option. It's now possible to single a single log, a single span, and a single traced request on a particular RUM session.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

  • Run unit tests
  • Run integration tests
  • Run smoke tests

@plousada plousada requested a review from a team as a code owner June 30, 2023 16:45
@plousada plousada changed the title Ncreated/rumm 3320/gh 1292 fix apm with rum RUMM-3320 Fix APM local spans not correlating with RUM views Jun 30, 2023
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jun 30, 2023

Datadog Report

Branch report: ncreated/RUMM-3320/GH-1292-fix-apm-with-rum
Commit report: 1013519

dd-sdk-ios: 0 Failed, 0 New Flaky, 116 Passed, 0 Skipped, 3m 10.11s Wall Time

@plousada plousada force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch from 9ca02e8 to 11c68d3 Compare July 3, 2023 08:37
@@ -562,7 +562,7 @@ class LoggerTests: XCTestCase {

logMatchers.forEach {
$0.assertValue(
forKeyPath: RUMContextAttributes.IDs.sessionID,
forKeyPath: "application_id",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently enforcing the schema used by logs backend.
I'm wondering if it would be preferable to use the output of the mapping function mapInternalAttributeKey here instead of the actual expected key and leave that enforce to E2E tests.

Sources/Datadog/TracerConfiguration.swift Outdated Show resolved Hide resolved
@@ -72,7 +72,7 @@ internal struct TracingMessageReceiver: FeatureMessageReceiver {
///
/// - Parameter context: The updated core context.
private func update(context: DatadogContext) -> Bool {
if let attributes: [String: String] = context.featuresAttributes["rum"]?.ids {
if let attributes: [String: String?] = context.featuresAttributes["rum"]?.ids {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why this change was needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because user_action.id is an optional attribute, we wouldn't be able to read other context if casting to non-optional [String: String]. There was a bug where this context was read by tracer:

[
   "rum": [
      "application.id": "...", 
      "session.id": "...",
      "view.id": "...",
      "user_action.id": "..."
   ]
]

but this wasn't:

[
   "rum": [
      "application.id": "...", 
      "session.id": "...",
      "view.id": "..."
   ]
]

This surfaces the fragility of feature's context propagation in V2. Each feature sets this dictionary on one end and some features expect it on the other end. The communication channel is [String: Any] dictionary held by core.

@plousada plousada force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch 2 times, most recently from cefdd18 to 1013519 Compare July 3, 2023 23:12
@@ -72,7 +72,7 @@ internal struct TracingMessageReceiver: FeatureMessageReceiver {
///
/// - Parameter context: The updated core context.
private func update(context: DatadogContext) -> Bool {
if let attributes: [String: String] = context.featuresAttributes["rum"]?.ids {
if let attributes: [String: String?] = context.featuresAttributes["rum"]?.ids {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because user_action.id is an optional attribute, we wouldn't be able to read other context if casting to non-optional [String: String]. There was a bug where this context was read by tracer:

[
   "rum": [
      "application.id": "...", 
      "session.id": "...",
      "view.id": "...",
      "user_action.id": "..."
   ]
]

but this wasn't:

[
   "rum": [
      "application.id": "...", 
      "session.id": "...",
      "view.id": "..."
   ]
]

This surfaces the fragility of feature's context propagation in V2. Each feature sets this dictionary on one end and some features expect it on the other end. The communication channel is [String: Any] dictionary held by core.

Comment on lines 12 to +17
/// The ID of RUM application (`String`).
internal static let applicationID = "application_id"
internal static let applicationID = "application.id"

/// The ID of current RUM session (standard UUID `String`, lowercased).
/// In case the session is rejected (not sampled), RUM context is set to empty (`[:]`) in core.
internal static let sessionID = "session_id"
internal static let sessionID = "session.id"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocker 🚫/ This change break Session Replay, which expects "application_id" and "session_id". With being unable to recognise these new values SR thinks there is no RUM, so it doesn't record anything.

  1. We need to update these values here.
  2. Also please update these values in comments of RUM dependency definition within SR module here - this definition is a central place where SR defines everything it expects or provides to other modules in SDK context.

I bump the need for 2 things:

  • Integration Tests for SR: it is now possible on feature/v2 - I created REPLAY-1863
  • Integration Unit Tests: to validate feature's contract in unit tests - @maxep do we have JIRA for this?

Copy link
Member

@maxep maxep Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration Tests for SR: it is now possible on feature/v2

If we base this on feature/v2, we won't be able to release it as part of the next 1.x. So I don't think we should block this on SR integration tests, especially since it was reported by users. We should update the values in SR but we can't add integration tests with just yet.

Integration Unit Tests: to validate feature's contract in unit tests - @maxep do we have JIRA for this?

We don't, currently integration units are part of DatadogTests target. But I will create a ticket to migrate them 👍

Comment on lines +30 to +39
internal func mapInternalAttributeKey(_ originalAttribute: String) -> String {
switch originalAttribute {
case "application.id":
return "application_id"
case "session.id":
return "session_id"
default:
return originalAttribute
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion/ Could we declare constants for "application.id" and "session.id" as part of RUMDependency namespace defined in Logging module? Similar to how we define RUMDependency in Session Replay - single place with listing all attributes that features exchange between each other.

cc @maxep

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we work with V1 code (no modules yet), it would have to be enum LogsRUMDependency (or similar) to scope it properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having constants makes sense, should be defined alongside the message receiver IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message receiver can be a good place to define "received" attributes. Where to define "provided" attributes?

Looking at mentioned RUMDependency in SR there are two parts:

internal enum RUMDependency {
   // MARK: Contract from RUM to SR:
   static let rumBaggageKey = "rum"
   static let serverTimeOffsetKey = "server_time_offset"
   static let ids = "ids"

   // MARK: Contract from SR to RUM (mirror of `SessionReplayDependency` in RUM):
   static let srBaggageKey = "session-replay"
   static let hasReplay = "has_replay"
}

and I believe it is good and clear enough - single place describing the whole contract.

If we want to separate it, it's OK for me as long as it is clear and consistent across all modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I believe it is good and clear enough - single place describing the whole contract.

Yep, totally. Just to add my 2 cents: I don't think we should enforce any convention across modules on that. Some Features will use/share a lot of context values, some won't. We can have some flexibility on how Features are implemented internally IMHO

Comment on lines +21 to +34
internal func mapInternalTags(_ originalTag: String) -> String {
switch originalTag {
case "application.id":
return "_dd.application.id"
case "session.id":
return "_dd.session.id"
case "view.id":
return "_dd.view.id"
case "user_action.id":
return "_dd.action.id"
default:
return originalTag
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion/ Similar ask to the one in Logging:

suggestion/ Could we declare constants for "application.id" and "session.id" as part of RUMDependency namespace defined in Tracing module? Similar to how we define RUMDependency in Session Replay - single place with listing all attributes that features exchange between each other.

cc @maxep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the constants are shared eg "application_id", declaring them each module (Tracing, Session Replay etc) seems error prone. IMO, it should go atleast in Internal module which is shared by all other modules.

Not related to this, but having an API package for each feature makes these things easier to maintain, where there could be RUMApi module with plain structs, APIs and constants (ideal world) which can be taken as dependency by others without all the baggage of RUM.

@ncreated ncreated self-assigned this Jul 14, 2023
@ncreated ncreated force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch from 1013519 to bf28ee9 Compare July 14, 2023 16:10
@ncreated ncreated changed the base branch from develop to release/1.22.0 July 14, 2023 16:11
@ncreated ncreated marked this pull request as draft July 14, 2023 16:19
@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Jul 14, 2023

Datadog Report

Branch report: ncreated/RUMM-3320/GH-1292-fix-apm-with-rum
Commit report: d196315

❄️ dd-sdk-ios: 0 Failed, 3 New Flaky, 2208 Passed, 0 Skipped, 28m 56.95s Wall Time

New Flaky Tests (3)

  • testRUMStopSessionScenario - RUMStopSessionScenarioTests - Last Failure

    Expand for error
     Assertion Failure at RUMStopSessionScenarioTests.swift:121: XCTAssertTrue failed
    
  • testGivenLoggingFeatureNotRegistered_whenSendingLogFromSpan_itPrintsWarning - TracerTests - Last Failure

    Expand for error
     Assertion Failure at TracerTests.swift:1178: XCTAssertEqual failed: ("nil") is not equal to ("Optional("The log for span \"foo\" will not be send, because the Logging feature is disabled.")")
    
  • testGivenTracingInstrumentationEnabled_whenInterceptingURLSessionTasks_itNotifiesStartAndCompletion - URLSessionInterceptorTests - Last Failure

    Expand for error
     Assertion Failure at URLSessionInterceptorTests.swift:373: Asynchronous wait failed: Exceeded timeout of 0.5 seconds, with unfulfilled expectations: "Complete task interception".
    

@ncreated ncreated force-pushed the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch from bf28ee9 to ccd6149 Compare July 14, 2023 16:45
@ncreated ncreated changed the title RUMM-3320 Fix APM local spans not correlating with RUM views RUMM-3320 [V1] Fix APM local spans not correlating with RUM views Jul 17, 2023
@ncreated ncreated marked this pull request as ready for review July 18, 2023 12:07
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxep || @ganeshnj I pushed this topic forward and now it is ready for review.

  • Only fixing Session Replay compared to previous work (the change here has low impact as we don't provide SR in V1 except private beta branch, but I did it anyway).
  • Like I mentioned on Slack, this is only minimal change to make it work.
  • We're fixing V1 here - V2 fix is in RUMM-3320 fix: [V2] Fix APM local spans not correlating with RUM views #1375
  • This PR is targeting release/1.22.0 branch - I will do release after merge.

CI is green, Bitrise checks are false:

Screenshot 2023-07-18 at 14 11 59

@@ -39,6 +39,12 @@ private class DebugRUMSessionViewModel: ObservableObject {
@Published var errorMessage: String = ""
@Published var resourceKey: String = ""

@Published var logMessage: String = ""
@Published var spanOperationName: String = ""
@Published var instrumentedRequestURL: String = "https://api.shopist.io/checkout.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently it returns

{
    "status": 404,
    "error": "Not Found"
}

expected ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is POST endpoint (GET always returns 404), but getting some % of errors on it is expected as this is part of demo environment 👍.

Copy link
Member

@maxep maxep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I left a question.

Sources/Datadog/Tracing/Span/SpanEventEncoder.swift Outdated Show resolved Hide resolved
@ncreated ncreated merged commit 67fefcd into release/1.22.0 Jul 19, 2023
@ncreated ncreated deleted the ncreated/RUMM-3320/GH-1292-fix-apm-with-rum branch July 19, 2023 09:56
@ncreated ncreated mentioned this pull request Jul 20, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants