Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-1064 AppStateListener added #436

Merged
merged 1 commit into from
Mar 25, 2021

Conversation

buranmert
Copy link
Contributor

@buranmert buranmert commented Mar 8, 2021

The problem

We are seeing spans in APM that have duration longer than hours, even days.
We realized that those are caused by the app going background and staying there for long time.
The span finishes only once the app goes back to foreground and this can result in spans longer than hours.

What and why?

The problem above happens since the app gets suspended/paused at background and picks up where it left off when it comes back to foreground.
We cannot change this behavior from our SDK, therefore we decided to provide more information so that our users can tell when their network requests take too long from when their apps stay at background for too long.

How?

We will add foreground_duration: TimeInterval and isBackground: Bool tags/metrics to Spans.

  1. TaskInterception will be responsible for adding these to the Span
  2. AppStateListener will be responsible for listening to app state along the lifetime of interception and providing AppStateHistory data structure
  3. foregroundDuration and isBackground are properties of AppStateHistory data structure

In this PR, we implemented 2nd and 3rd steps. In the next PR, we will implement 1st step by plugging AppStateListener into TaskInterception It's done in this PR as it was not that major.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference

TODO

  • TaskInterception/URLSessionTracingHandler needs new test cases
  • Verify foreground_duration is sent as numerical value and can be queried

@buranmert buranmert self-assigned this Mar 8, 2021
@buranmert buranmert requested a review from a team as a code owner March 8, 2021 18:33
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea, looks good 👍. Left important questions on threading and listener's life-cycle.

@buranmert buranmert force-pushed the buranmert/RUMM-1064-background-spans branch 3 times, most recently from 9276c8d to a5496e5 Compare March 12, 2021 19:23
@buranmert buranmert force-pushed the buranmert/RUMM-1064-background-spans branch 2 times, most recently from ee69ce5 to 8c716a5 Compare March 12, 2021 21:40
@buranmert buranmert force-pushed the buranmert/RUMM-1064-background-spans branch from 8c716a5 to 39d3854 Compare March 22, 2021 15:52
Comment on lines 75 to 76
span.setTag(key: "foreground_duration", value: "\(appStateHistory.foregroundDuration)")
span.setTag(key: "is_background", value: "\(appStateHistory.didRunInBackground)")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. should they be declared in DDTags? (unlike other DDTags these are not officially supported by Datadog)
  2. should they be added as tags? (i couldn't think of any other place for these)
    cc @ncreated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use internal const defined in DDTag for those (as we do for few other tags).

In APM tags are encoded as meta.* attributes and those are always send as String. This would make querying or aggregating impossible. In contrast to meta.*, APM defines also metrics.* which are numerics.

IMO sending "foreground_duration" as String makes little sense - the user won't be able to do anything with it in APM analytics. We should rather remove that one and keep only "is_background" or send "foreground_duration" as a custom metric (which would require new implementation in DDSpan and SpanEncoder as we don't support metrics yet).

WDYT @xgouchet @mariusc83 @0xnm ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed string conversion, now it's

span.setTag(key: "foreground_duration", value: appStateHistory.foregroundDuration.toNanoseconds)

with this, you can create a measure (open the trace, click on foreground_duration tag, select add facet, select measure tab at the top) and filter traces longer/shorter than your threshold

NOTE: once you do that, it takes some time (a few minutes maybe) before you can use it in your query. at first i thought it wasn't working but it actually works 👌

example-query

@buranmert buranmert requested a review from ncreated March 22, 2021 15:57
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AppStateHistory looks good 👍, but the threading model proposed for AppStateListener is incorrect.

W need to solve a classic readers-writers problem with:

  • occasional writes (when the app transitions between background and foreground)
  • intensive reads (when a network request completes).

The model implemented in this PR doesn't leverage those conditions and it treats writes and reads equally. Because of writes being always performed from the main thread, the main thread receives a penalty from reads happening on background thread.

Given above conditions, the optimal threading model should work this way:

  • occasional writes should be not blocking (no locks executed on the main thread),
  • intensive reads should be concurrent (allowing simultaneous reads).

This can be achieved with a concurrent DispatchQueue (simultaneous reads) and .barrier API (for exclusive writes).

This problem was solved in #422 where we introduced ValuePublisher type. We can leverage it this way:

let appStateHistory = ValuePublisher<AppStateHistory>()

// From the main thread, non-blocking async write:
appStateHistory.publishAsync(newAppStateHistory)

// From any background thread, concurrent read:
_ = appStateHistory.currentValue

Comment on lines 75 to 76
span.setTag(key: "foreground_duration", value: "\(appStateHistory.foregroundDuration)")
span.setTag(key: "is_background", value: "\(appStateHistory.didRunInBackground)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use internal const defined in DDTag for those (as we do for few other tags).

In APM tags are encoded as meta.* attributes and those are always send as String. This would make querying or aggregating impossible. In contrast to meta.*, APM defines also metrics.* which are numerics.

IMO sending "foreground_duration" as String makes little sense - the user won't be able to do anything with it in APM analytics. We should rather remove that one and keep only "is_background" or send "foreground_duration" as a custom metric (which would require new implementation in DDSpan and SpanEncoder as we don't support metrics yet).

WDYT @xgouchet @mariusc83 @0xnm ?

Sources/Datadog/Core/System/AppStateListener.swift Outdated Show resolved Hide resolved
Sources/Datadog/Core/System/AppStateListener.swift Outdated Show resolved Hide resolved
Sources/Datadog/Core/System/AppStateListener.swift Outdated Show resolved Hide resolved
@buranmert buranmert requested a review from ncreated March 24, 2021 18:16
@buranmert buranmert force-pushed the buranmert/RUMM-1064-background-spans branch from 3808aca to 866d2a3 Compare March 24, 2021 18:23
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👌, except few comments to clean up.

let appStateHistory = appStateListener.history.take(
between: resourceMetrics.fetch.start...resourceMetrics.fetch.end
)
// TODO: RUMM-1064 tag doesn't seem like the best fit but can't find anything better
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems solved, so:

Suggested change
// TODO: RUMM-1064 tag doesn't seem like the best fit but can't find anything better

Comment on lines 33 to 40

// func testWhenInterceptionIsCreated_itHasSharedAppStateListener() {
// // When
// let interception = TaskInterception(request: .mockAny(), isFirstParty: true)
//
// // Then
// XCTAssert(interception.appStateListener === AppStateListener.shared)
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some leftovers, 🧽:

Suggested change
// func testWhenInterceptionIsCreated_itHasSharedAppStateListener() {
// // When
// let interception = TaskInterception(request: .mockAny(), isFirstParty: true)
//
// // Then
// XCTAssert(interception.appStateListener === AppStateListener.shared)
// }

This will avoid unrealistically long spans in APM.
URLSessionTracingHandler will listen to app state changes
and will add foreground duration, background information to the span.

PR comments addressed.
@buranmert buranmert force-pushed the buranmert/RUMM-1064-background-spans branch from 866d2a3 to 0eb6772 Compare March 25, 2021 09:21
@buranmert buranmert merged commit 2b48044 into master Mar 25, 2021
@buranmert buranmert deleted the buranmert/RUMM-1064-background-spans branch March 25, 2021 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants