Skip to content
This repository has been archived by the owner on Nov 1, 2022. It is now read-only.

For #11876: Bucket native code crashes by process type in GleanCrashReporterService #11908

Merged
merged 2 commits into from
Mar 24, 2022

Conversation

jamienicol
Copy link
Contributor

Make GleanCrashReporterService count native code crashes based on
their processType field rather than whether they are fatal or
non-fatal.

Persisted fatal and non-fatal crashes will still be submitted for now,
but this code should be removed in a follow-up patch once we have
allowed time for them to be submitted.

Pull Request checklist

  • Quality: This PR builds and passes detekt/ktlint checks (A pre-push hook is recommended)
  • Tests: This PR includes thorough tests or an explanation of why it does not
  • Changelog: This PR includes a changelog entry or does not need one
  • Accessibility: The code in this PR follows accessibility best practices or does not include any user facing features

After merge

  • Milestone: Make sure issues closed by this pull request are added to the milestone of the version currently in development.
  • Breaking Changes: If this is a breaking change, please push a draft PR on Reference Browser to address the breaking issues.

@jamienicol
Copy link
Contributor Author

Request for data collection review form

All questions are mandatory. You must receive review from a data steward peer on your responses to these questions before shipping new data collection.

  1. What questions will you answer with this data?

Which type of process a native code crash occured in.

  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

We currently track whether a crash is fatal or non-fatal. This further
distinguishes whether a non-fatal crash occured in a background or foreground process.

This will allow us to keep track of crash numbers even when users do
not report crashes. And importantly monitor whether a larger number of
crashes in background process are going unreported, as they use a
different UI to ask the user to report the crash.

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

There is no alternative.

  1. Can current instrumentation answer these questions?

No.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Note that the data steward reviewing your request will characterize your data collection based on the highest (and most sensitive) category.

Measurement Description Data Collection Category Tracking Bug #
Crash process type Category 1 - technical data #11876
  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.
  • Often the Privacy Notice for your product will link to where the documentation is expected to be.
  • Common examples for Mozilla products/services:
    • If this collection is Telemetry you can state "This collection is documented in its definitions files Histograms.json, Scalars.yaml, and/or Events.yaml and in the Probe Dictionary at https://probes.telemetry.mozilla.org."
    • If this data is collected using the Glean SDK you can state “This collection is documented in the Glean Dictionary at https://dictionary.telemetry.mozilla.org/"
  • In some cases, documentation is included in the project’s repository.

This collection is documented in the Glean Dictionary at https://dictionary.telemetry.mozilla.org/

  1. How long will this data be collected? Choose one of the following:
  • This is scoped to a time-limited experiment/project until date MM-DD-YYYY.

  • I want this data to be collected for 6 months initially (potentially renewable).

  • I want to permanently monitor this data. (put someone’s name here)

I want to permanently monitor this data. Jamie Nicol

  1. What populations will you measure?
  • Which release channels?

  • Which countries?

  • Which locales?

  • Any other filters? Please describe in detail below.

All release channels, countries and locales. No other filters.

  1. If this data collection is default on, what is the opt-out mechanism for users?

Default telemetry opt-opt.

  1. Please provide a general description of how you will analyze this data.

If a large number of crashes are detected I will look at socorro to
see if any crashes have been found.

If a large discrepancy is found between the number of crash pings
counted here and the number of reports submitted to socorro, we will
re-evaluate the crash-reporting UI for background child process crashes.

  1. Where do you intend to share the results of your analysis?

On github, bugzilla.

  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:
  • Are you using that on the Mozilla backend? Or going directly to the third-party?

No third party tool.

@jamienicol
Copy link
Contributor Author

I'll need to update the yaml file with the new data review link, once completed.

Copy link
Contributor

@rocketsroger rocketsroger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some questions. Thanks

@@ -36,6 +36,12 @@ class GleanCrashReporterService(
// as the persisted crashes in the crash count file (see above comment)
const val UNCAUGHT_EXCEPTION_KEY = "uncaught_exception"
const val CAUGHT_EXCEPTION_KEY = "caught_exception"
const val MAIN_PROCESS_NATIVE_CODE_CRASH_KEY = "main_proc_native_code_crash"
const val FOREGROUND_CHILD_PROCESS_NATIVE_CODE_CRASH_KEY = "fg_proc_native_code_crash"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for me just using foreground / background (vs fg / bg) here is more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build fails if the keys are too long, unfortunately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That's unfortunate. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think foreground_native_code_crash (without the proc) would fit. Would you prefer that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! That sounds good too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps remove the code, ie foreground_proc_native_crash?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no. Unfortunately none of those options work. I think we'll need to leave it as it is...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah that's even better. then it matches main_proc_native_crash.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, just a small nit. Thanks

@@ -132,6 +144,9 @@ class GleanCrashReporterService(
when (line) {
UNCAUGHT_EXCEPTION_KEY -> ++uncaughtExceptionCount
CAUGHT_EXCEPTION_KEY -> ++caughtExceptionCount
MAIN_PROCESS_NATIVE_CODE_CRASH_KEY -> ++mainProcessNativeCodeCrashCount
FOREGROUND_CHILD_PROCESS_NATIVE_CODE_CRASH_KEY -> ++foregroundChildProcessNativeCodeCrashCount
BACKGROUND_CHILD_PROCESS_NATIVE_CODE_CRASH_KEY -> ++backgroundChildProcessNativeCodeCrashCount
FATAL_NATIVE_CODE_CRASH_KEY -> ++fatalNativeCodeCrashCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dexterp37 suggested here that we keep the deprecated keys around for a short while so that the pending data can still be submitted correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Please create an issue to track removing the deprecated keys. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

var foregroundChildProcessNativeCodeCrashCount = 0
var backgroundChildProcessNativeCodeCrashCount = 0
// These keys are deprecated and should be removed after a period to allow for persisted
// crashes to be submitted.
var fatalNativeCodeCrashCount = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be removed?

const val BACKGROUND_CHILD_PROCESS_NATIVE_CODE_CRASH_KEY = "bg_proc_native_code_crash"

// These keys are deprecated and should be removed after a period to allow for persisted
// crashes to be submitted.
const val FATAL_NATIVE_CODE_CRASH_KEY = "fatal_native_code_crash"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be removed?

@rocketsroger
Copy link
Contributor

Jamie Nicol

Data Review

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?

Yes, through the metrics.yaml file and the Glean Dictionary

  1. Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, through the "Send Usage Data" preference in the application settings

  1. If the request is for permanent data collection, is there someone who will monitor the data over time?

Jamie Nicol

  1. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical data

  1. Is the data collection request for default-on or default-off?

default-on

  1. Does the instrumentation include the addition of any new identifiers?

No

  1. Is the data collection covered by the existing Firefox privacy notice?

Yes

  1. Does the data collection use a third-party collection tool?

No

Result

data-review+

@rocketsroger
Copy link
Contributor

I'll need to update the yaml file with the new data review link, once completed.

Please also add your email to the notification_emails: section. Thanks!

@rocketsroger rocketsroger added the do not land PRs that requires coordination before landing label Mar 22, 2022
@rocketsroger
Copy link
Contributor

Adding label till yaml is updated. Thanks

…in GleanCrashReporterService

Make GleanCrashReporterService count native code crashes based on
their processType field rather than whether they are fatal or
non-fatal.

Persisted fatal and non-fatal crashes will still be submitted for now,
but this code should be removed in a follow-up patch once we have
allowed time for them to be submitted.
Copy link
Contributor

@rocketsroger rocketsroger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@rocketsroger rocketsroger added the 🛬 needs landing PRs that are ready to land label Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🛬 needs landing PRs that are ready to land
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants