RUMM-640 Add API for clearing out all data #644

ncreated · 2021-10-19T11:09:59Z

What and why?

📦 This PR adds Datadog.clearAllData() API.

/// Clears all data that has not already been sent to Datadog servers.
public static func clearAllData()

It deletes all authorised (.granted) and unauthorised (.pending) data (not yet uploaded) buffered on device. This includes data for Logging, Tracing, RUM and Internal Monitoring features.

How?

Because all our features use the same abstraction, I simply added DataOrchestrator type to FeatureStorage. Unlike FilesOrchestrator, DataOrchestrator manages data in multiple folders (we use it for managing authorised and unauthorised directories). The public API calls DataOrchestrator.deleteAllFiles() for each active feature.

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference

so it also checks that next files keep the limit.

now the `markAllFilesAsReadable()` method better fits the `DataOrchestrator` which has an explicit access to both `FileOrchestrators` for each feature.

ncreated · 2021-10-19T11:13:01Z

Sources/Datadog/Core/Persistence/DataOrchestrator.swift

+#if DD_SDK_COMPILED_FOR_TESTING
+    func markAllFilesAsReadable() {
+        queue.sync {
+            authorizedFilesOrchestrator.ignoreFilesAgeWhenReading = true
+            unauthorizedFilesOrchestrator.ignoreFilesAgeWhenReading = true
+        }
+    }
+#endif


Previously, this was part of FileReader as it had the access to filesOrchestrator (not accessible anywhere else). As now, DataOrchestrator manages files orchestrators, it seems better to move this logic here. This allowed cleaning up FileReader so it no longer exposes filesOrchestrator internally.

ncreated · 2021-10-19T11:15:32Z

Tests/DatadogTests/Datadog/Core/Persistence/FilesOrchestratorTests.swift

    func testGivenDefaultWriteConditions_whenFileCanNotBeUsedMoreTimes_itCreatesNewFile() throws {
        let orchestrator = configureOrchestrator(using: RelativeDateProvider(advancingBySeconds: 0.001))
-        var previousFile: WritableFile = try orchestrator.getWritableFile(writeSize: 1) // first use
+
+        var previousFile: WritableFile = try orchestrator.getWritableFile(writeSize: 1) // first use of a new file
        var nextFile: WritableFile

-        // use file maximum number of times
-        for _ in (0 ..< performance.maxObjectsInFile).dropLast() { // skip first use
-            nextFile = try orchestrator.getWritableFile(writeSize: 1)
-            XCTAssertEqual(nextFile.name, previousFile.name) // assert it uses same file
+        for _ in (0..<5) {
+            for _ in (0 ..< performance.maxObjectsInFile).dropLast() { // skip first use
+                nextFile = try orchestrator.getWritableFile(writeSize: 1)
+                XCTAssertEqual(nextFile.name, previousFile.name, "It must reuse the file when number of events is below the limit")
+                previousFile = nextFile
+            }
+
+            nextFile = try orchestrator.getWritableFile(writeSize: 1) // first use of a new file
+            XCTAssertNotEqual(nextFile.name, previousFile.name, "It must obtain a new file when number of events exceeds the limit")
            previousFile = nextFile
        }
-
-        // next time it returns different file
-        nextFile = try orchestrator.getWritableFile(writeSize: 1)
-        XCTAssertNotEqual(nextFile.name, previousFile.name)
    }


Not related to this PR directly, but I found this FilesOrchestrator test very weak so I enhanced it. Now it tests that "new file is always started after writing 500 events", instead of "new file is started after writing 500 events". Difference is that now it adds logical coverage for this line which is crucial for the storage & upload performance.

buranmert · 2021-10-20T09:20:47Z

Sources/Datadog/Datadog.swift

@@ -134,6 +134,11 @@ public class Datadog {
        instance?.consentProvider.changeConsent(to: trackingConsent)
    }

+    /// Clears all data that has not already been sent to Datadog servers.


maybe it might worth mentioning that this method is async and the files are not deleted once it returns.?

Good question 👌. Actually, initially I made this API sync to guarantee "no data" after it returns (by sacrificing the performance ofc), but then I realised that it makes no difference from the user standpoint. The SDK synchronizes all data collection internally so any event collected before clearAllData() will be deleted and any event collected after will be stored. That said, adding the information on async execution would be purely informative and could be considered a breaking change if we need to use sync for some reason in the future. WDYT @buranmert @maxep ?

Good question indeed 🤔 Hypothetically, if I do:

log(something) // <- will be deleted clearAllData() log(something) // <- will be stored

Because we are using serial readWriteQueue queues in each feature, am I correct?
If so, I don't think we have to specify it's async, the clearAllData will be doing what it's expected to do in a optimised manner.

Yes, this snippet is correct - the underlying readWriteQueue is serial and it guarantees the order of operations. Just in case we ever need to change its behaviour, I'd avoid expressing implementation detail in the comment.

ncreated added 6 commits October 15, 2021 14:44

RUMM-640 Enhance test for max number of events in file

48b3d01

so it also checks that next files keep the limit.

RUMM-640 Add option to delete all files through files orchestrator

a7aaf77

RUMM-640 Add DataOrchestrator which manages data in two directories

3227a22

RUMM-640 Clean up FileReader from testing extension code

c478574

now the `markAllFilesAsReadable()` method better fits the `DataOrchestrator` which has an explicit access to both `FileOrchestrators` for each feature.

RUMM-640 Add public clearAllData() API

bf90aef

RUMM-640 Update api-surface

e9a5f01

ncreated self-assigned this Oct 19, 2021

ncreated requested a review from a team as a code owner October 19, 2021 11:10

ncreated commented Oct 19, 2021

View reviewed changes

ncreated added the needs-docs To mark PRs which need documentation update label Oct 19, 2021

buranmert approved these changes Oct 20, 2021

View reviewed changes

maxep approved these changes Oct 20, 2021

View reviewed changes

ncreated merged commit 7bcb52a into master Oct 20, 2021

ncreated deleted the ncreated/RUMM-640-add-API-for-clearing-out-all-buffered-data branch October 25, 2021 12:29

ncreated mentioned this pull request Nov 9, 2021

RUMM-640 Fix flaky test in DatadogTests #659

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUMM-640 Add API for clearing out all data #644

RUMM-640 Add API for clearing out all data #644

ncreated commented Oct 19, 2021

ncreated Oct 19, 2021

ncreated Oct 19, 2021 •

edited

Loading

buranmert Oct 20, 2021

ncreated Oct 20, 2021

maxep Oct 20, 2021

ncreated Oct 20, 2021

RUMM-640 Add API for clearing out all data #644

RUMM-640 Add API for clearing out all data #644

Conversation

ncreated commented Oct 19, 2021

What and why?

How?

Review checklist

ncreated Oct 19, 2021

Choose a reason for hiding this comment

ncreated Oct 19, 2021 • edited Loading

Choose a reason for hiding this comment

buranmert Oct 20, 2021

Choose a reason for hiding this comment

ncreated Oct 20, 2021

Choose a reason for hiding this comment

maxep Oct 20, 2021

Choose a reason for hiding this comment

ncreated Oct 20, 2021

Choose a reason for hiding this comment

ncreated Oct 19, 2021 •

edited

Loading