Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1655100 - Capture device logs to diagnose intermittent iOS integration test failure #1133

Merged
merged 4 commits into from
Aug 13, 2020

Conversation

travis79
Copy link
Member

@travis79 travis79 commented Aug 5, 2020

Updated: This adds device logging in a way that we can capture them from the simulator used in testing on CI.

@auto-assign auto-assign bot requested a review from Dexterp37 August 5, 2020 17:30
@@ -43,6 +43,11 @@ class DeletionRequestPingTest: XCTestCase {
app.launchArguments = ["USE_MOCK_SERVER", "\(port)"]
app.launch()

// Sleep for a 100ms to prevent an unlikely race condition that, while
// not practically acheiveable in real-world situations, does appear to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this race condition? Can you describe what's racing against what and why that's unlikely to happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was about to ask the same.
If we are running into a race condition here we should try to plug that as well (even if that happens in a followup and we do this hack for now)

Copy link
Member Author

@travis79 travis79 Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'race' was more hypothesis than anything concrete. I could never get this to reproduce locally so the only thing I could think of that might affect this is that the test toggles the telemetry enabled switch as soon as the app is launched which could potentially still be during Glean's initialization. The race condition is I believe here, if the app beats Glean to here with toggling the switch before init is complete, then no deletion-request ping.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'race' was more hypothesis than anything concrete. I could never get this to reproduce locally so the only thing I could think of that might affect this is that the test toggles the telemetry enabled switch as soon as the app is launched which could potentially still be during Glean's initialization.

Shouldn't this be covered by #1046?

The race condition is I believe here, if the app beats Glean to here with toggling the switch before init is complete, then no deletion-request ping.

I believe we should probably let this race here then, and add more instrumentation (= log) to test the theories instead of suppressing it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in a private chat looking at this again (and oh, I wrote that code), that should probably be an atomic to prevent a possible race I can think of.

Otherwise I agree: Can we add logging and see if we trigger it again on CI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issues with that, it looks like we already log this as an error inside of glean but I don't see any indication of this in the test logs. The test logs don't appear to capture device logging.

@travis79 travis79 force-pushed the Bug1655100 branch 2 times, most recently from 73c17f2 to e939c6f Compare August 10, 2020 17:56
@travis79 travis79 changed the title Bug 1655100 - Add delay to iOS integration test to prevent intermittent failure Bug 1655100 - Capture device logs to diagnose intermittent iOS integration test failure Aug 10, 2020
@travis79
Copy link
Member Author

Okay, I've added a step that should capture the simulator device logs when we run the iOS integration tests. Running the iOS tests to ensure it works, so not ready for review yet.

@travis79 travis79 force-pushed the Bug1655100 branch 2 times, most recently from faf919a to a4db70c Compare August 11, 2020 16:15
@@ -636,9 +636,15 @@ jobs:
name: Run sample app tests
command: |
bash bin/run-ios-sample-app-test.sh
SIMULATOR_UDID=$(xcrun instruments -s devices | grep 'iPhone 11 (13' | awk -F '[][]' '{print $2}')
xcrun simctl spawn $SIMULATOR_UDID log collect --output ~/project/ios-integration-test.logarchive 2>/dev/null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL if there's only one simulator running, this works:

Suggested change
xcrun simctl spawn $SIMULATOR_UDID log collect --output ~/project/ios-integration-test.logarchive 2>/dev/null
xcrun simctl spawn booted log collect --output ~/project/ios-integration-test.logarchive 2>/dev/null

Then we don't even need the UUID

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a shot, I was just getting ready to try fastlane scan

@travis79
Copy link
Member Author

Okay, with @badboy's help I think we have this where we are actually saving the logs in CI in such a way that we can see the Glean logs that we are interested in.

@travis79 travis79 requested a review from Dexterp37 August 12, 2020 18:41
@Dexterp37 Dexterp37 requested review from badboy and removed request for Dexterp37 August 13, 2020 07:22
@Dexterp37 Dexterp37 dismissed their stale review August 13, 2020 07:23

The PR changed its intent

@Dexterp37
Copy link
Contributor

I defer to @badboy for the review of this, seems it seems very iOS specific.

Copy link
Member

@badboy badboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 -- this should get a changelog notice as we're changing how logging works a bit (not a major change for devs using it, but let's still call it out).

We should keep the associated bug open so that we take a look at the next time we see that intermittent (I leave a comment).
I'll also open a followup bug to change the Rust logging on iOS.

@travis79
Copy link
Member Author

Okay, changelog entry added and I'll leave the bug open for now (I'll untake it and make it a P2 for now)

@travis79 travis79 merged commit f0c4512 into mozilla:main Aug 13, 2020
@travis79 travis79 deleted the Bug1655100 branch August 13, 2020 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants