ref(replays): Restrict data scrubbing to sentry payloads #1825
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Unconditional data scrubbing can can break recording payloads because it scrubs strings that should be excluded from scrubbing. Eventually, the data scrubber will have to recurse through rrweb's DOM nodes and skip safe sub-trees or fields from scrubbing.
However, the client's default is to redact all strings from the DOM, which greatly reduces the chance of sensitive data ending up in the recording payloads. This leaves just Sentry's own payloads to be scrubbed, which are always top-level events marked with
type: 5
.This PR updates the scrubber and introduces a visitor that streams through the top-level event list. When it encounters an item with type
5
, it uses the existing transforming deserializer to scrub all strings in the event, otherwise skipping it. This solution allows us to continue scrubbing without a strict schema.Alternatives
During implementation, there were two main alternatives considered:
type
,timestamp
, and opaquedata
. Inspect the type, conditionally map it to another struct that wrapsdata
inScrubbedValue
, and then serialize that struct.RawValue
, parse it once with a helper struct to obtain the type. If the type matches, send the entire value through the transformer.The second approach ultimately ended up in less code with equal to better performance, even though it requires to parse the raw value twice.
Safety
The
raw_value
feature in theserde_json
deserializer is now enabled for all of Relay, since it is not possible to load the same crate another time with a separate set of features. There are no documented caveats on theraw_value
feature inserde_json
. The implementation uses a special token ("$serde_json::private::RawValue"
) in a way that makes it is safe for both serialize and deserialize:serialize_struct
with the token as name. Triggering this behavior requires a manual implementation ofSerialize
using this token, which our code does not contain.deserialize_newtype_struct
with the token as name. Again, this requires a manual implementation ofDeserialize
to trigger accidentally.Closes #1806