-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): Add initial minidump scrubbing #682
Conversation
b38fbe4
to
3366e0f
Compare
examples/edit_minidump.rs
Outdated
changed |= pii_processor.scrub_attachment_bytes( | ||
&filename, | ||
data.desc_mut_slice(&raw_desc), | ||
AttachmentBytesType::PlainAttachment, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@untitaker this is extremely untested, it just compiles. The format of these is just a bunch of bytes, which are "unix string" (aka any corrupted encoding possible) separated by NULL bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we can get the locale settings out of the minidump to figure out how to decode paths, env, and whatever this is.
Possibly we can find locale settings in the env, so we could try to lossy-decode the env as ascii first, parse out locale, then parse everything (except the memory) as that encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, locale can normally be parsed out of the environ and used to parse out the encoding (my past self seems to think https://hg.sr.ht/~flub/syskit/browse/syskit/_utils.py?rev=tip#L109 is the way to get this, there's probably rust libs though).
Doing this gets back to how to reverse an encoding to modify the minidump though.
@untitaker @jan-auer would you mind another review mainly looking at:
comments on relay-minidump/src/lib.rs are also welcome, though less necessary currently. I've left the example untouched for now, but the idea is that it would also use this and becomes a thing cli wrapper only. |
This also zeroes out the stack, which is entirely undesirable.
This also removes some of the manual work to read the minidump as the existing API does actually expose enough to get to all the information.
This also puts some indirection in place and documents things a bit more.
This is pretty basic, but works.
Move regex mapping into separate module. We need to share this logic with attachment scrubbing once it's there. Also change usage of `Option<BTreeSet<u8>>` into dedicated enum.
Move regex mapping into separate module. We need to share this logic with attachment scrubbing once it's there. Also change usage of `Option<BTreeSet<u8>>` into dedicated enum.
Also refactor things a bit, use MINIDUMP_LOCATION_DESCRIPTOR instead of our own custom types. Add a trait to make it's slicing easy.
This needs access also to the raw data to do pointer arithmetic. So this refactors this a bit to have one struct which contains both the Minidump struct and the raw data.
5baa975
to
0f1e034
Compare
Ok, this is not as finished as I'd like. Current state:
|
* master: feat(server): Support chunked formdata for Crashpad (#721)
@untitaker Can you confirm that this will only scrub minidumps if explicitly configured? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine but we need to figure out how to deploy this without enabling. Right now this will match on selectors *
and **
, and the former cannot be solved with selector specificness at the moment. Also if somebody were to have configured a selector named 'UE4Minidump.dmp'
, this would (and should) apply as far as I understand.
I think the sanest way forward would be to have a featureflag inside of the projectconfig that is attached to a featureflag in Sentry.
* master: build(workspace): Move relay binary to workspace crate (#733) fix: Fix data schemas github action (#730) fix(docs): Incorrect init command in README (#732) fix(getting-started): Fix documentation links to other pages (#729) feat(server): Add initial minidump scrubbing (#682) fix: Remove jsonschema docs (#727) fix(docs): Fix a broken link test(server): Add unit tests to tracked future (#725) feat(server): Support chunked formdata for Crashpad (#721) fix(danger): Comment only on missing changelogs (#723) test: Require any python3 and support python3.8 (#722) fix: Re-add pii=maybe to span databags (#720) fix(schema): Type out span data and tags (#713)
Implements initial data scrubbing of UTF-8 strings in heap and stack memory streams in minidumps. All sections are marked as
pii = "maybe"
, which makes it effectively disabled for current PII rules.