Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): Add initial minidump scrubbing #682

Merged
merged 34 commits into from
Aug 24, 2020

Conversation

flub
Copy link
Contributor

@flub flub commented Jul 28, 2020

Implements initial data scrubbing of UTF-8 strings in heap and stack memory streams in minidumps. All sections are marked as pii = "maybe", which makes it effectively disabled for current PII rules.

@flub flub force-pushed the feat/minidump-edit-example branch from b38fbe4 to 3366e0f Compare July 28, 2020 14:56
@flub flub changed the title Extremely rough and inefficient version of zeroing out memory Minidump scrubbing experiment Jul 30, 2020
Comment on lines 140 to 144
changed |= pii_processor.scrub_attachment_bytes(
&filename,
data.desc_mut_slice(&raw_desc),
AttachmentBytesType::PlainAttachment,
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@untitaker this is extremely untested, it just compiles. The format of these is just a bunch of bytes, which are "unix string" (aka any corrupted encoding possible) separated by NULL bytes.

Copy link
Member

@untitaker untitaker Jul 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether we can get the locale settings out of the minidump to figure out how to decode paths, env, and whatever this is.

Possibly we can find locale settings in the env, so we could try to lossy-decode the env as ascii first, parse out locale, then parse everything (except the memory) as that encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, locale can normally be parsed out of the environ and used to parse out the encoding (my past self seems to think https://hg.sr.ht/~flub/syskit/browse/syskit/_utils.py?rev=tip#L109 is the way to get this, there's probably rust libs though).

Doing this gets back to how to reverse an encoding to modify the minidump though.

@flub
Copy link
Contributor Author

flub commented Aug 17, 2020

@untitaker @jan-auer would you mind another review mainly looking at:

  • how this is integrated in relay. I'm not sure this is the right place or done in the right way (also, the interface between the two can be done better)

  • where the code should live. I've created a new crate as that was suggested sometime long ago, but it's a bit of a pain to keep the shared deps in sync. Though in some way having all the processing-feature functionality in a crate seems appealing to me and I don't think that exists yet. Anyway, if a crate is the way to go i should obviously do this correctly with making it optional and correctly pulled in byt the processing feature.

comments on relay-minidump/src/lib.rs are also welcome, though less necessary currently. I've left the example untouched for now, but the idea is that it would also use this and becomes a thing cli wrapper only.

Floris Bruynooghe and others added 24 commits August 18, 2020 17:58
This also zeroes out the stack, which is entirely undesirable.
This also removes some of the manual work to read the minidump as the
existing API does actually expose enough to get to all the
information.
This also puts some indirection in place and documents things a bit
more.
This is pretty basic, but works.
Move regex mapping into separate module. We need to share this logic with attachment scrubbing once it's there.

Also change usage of `Option<BTreeSet<u8>>` into dedicated enum.
Move regex mapping into separate module. We need to share this logic with attachment scrubbing once it's there.

Also change usage of `Option<BTreeSet<u8>>` into dedicated enum.
Also refactor things a bit, use MINIDUMP_LOCATION_DESCRIPTOR instead
of our own custom types.  Add a trait to make it's slicing easy.
This needs access also to the raw data to do pointer arithmetic.  So
this refactors this a bit to have one struct which contains both the
Minidump struct and the raw data.
@flub flub force-pushed the feat/minidump-edit-example branch from 5baa975 to 0f1e034 Compare August 18, 2020 16:16
@flub
Copy link
Contributor Author

flub commented Aug 18, 2020

Ok, this is not as finished as I'd like. Current state:

  • Theres a dependency mistake somewhere. Invoking just cargo build fails but cargo build --workspaces --all-features does work (which is what I usually do.

  • There are some failing integration tests about the minidumps, I think this is an actual bug that should be found

  • (minor) The example doesn't use the new module yet

@jan-auer jan-auer changed the title Minidump scrubbing experiment feat(server): Add initial minidump scrubbing Aug 19, 2020
* master:
  fix(danger): Comment only on missing changelogs (#723)
  test: Require any python3 and support python3.8 (#722)
@jan-auer jan-auer marked this pull request as ready for review August 20, 2020 13:35
@jan-auer jan-auer requested a review from a team August 20, 2020 13:35
@jan-auer
Copy link
Member

@untitaker Can you confirm that this will only scrub minidumps if explicitly configured?

Copy link
Member

@untitaker untitaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine but we need to figure out how to deploy this without enabling. Right now this will match on selectors * and **, and the former cannot be solved with selector specificness at the moment. Also if somebody were to have configured a selector named 'UE4Minidump.dmp', this would (and should) apply as far as I understand.

I think the sanest way forward would be to have a featureflag inside of the projectconfig that is attached to a featureflag in Sentry.

@jan-auer jan-auer requested a review from untitaker August 24, 2020 07:32
@jan-auer jan-auer merged commit bbb063c into master Aug 24, 2020
@jan-auer jan-auer deleted the feat/minidump-edit-example branch August 24, 2020 07:41
jan-auer added a commit that referenced this pull request Aug 26, 2020
* master:
  build(workspace): Move relay binary to workspace crate (#733)
  fix: Fix data schemas github action (#730)
  fix(docs): Incorrect init command in README (#732)
  fix(getting-started): Fix documentation links to other pages (#729)
  feat(server): Add initial minidump scrubbing (#682)
  fix: Remove jsonschema docs (#727)
  fix(docs): Fix a broken link
  test(server): Add unit tests to tracked future (#725)
  feat(server): Support chunked formdata for Crashpad (#721)
  fix(danger): Comment only on missing changelogs (#723)
  test: Require any python3 and support python3.8 (#722)
  fix: Re-add pii=maybe to span databags (#720)
  fix(schema): Type out span data and tags (#713)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants