Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(replays): Use Annotated struct definition for replay-event parsing #1582

Merged
merged 53 commits into from
Jan 12, 2023

Conversation

cmanallen
Copy link
Member

@cmanallen cmanallen commented Nov 15, 2022

Removes direct dependency on Serde and replaces it with Annotated. The motivation to move to Annotated was to maintain strong compatibility with wider Sentry ecosystem and to streamline the implementation of PII scrubbing.

closes: https://github.com/getsentry/replay-backend/issues/183

@cmanallen cmanallen marked this pull request as ready for review November 29, 2022 16:45
@cmanallen cmanallen requested a review from a team November 29, 2022 16:45
@github-actions
Copy link

Fails
🚫 Please consider adding a changelog entry for the next release.
Instructions and example for changelog

For changes exposed to the Python package, please add an entry to py/CHANGELOG.md. This includes, but is not limited to event normalization, PII scrubbing, and the protocol.

For changes to the Relay server, please add an entry to CHANGELOG.md under the following heading:

  1. Features: For new user-visible functionality.
  2. Bug Fixes: For user-visible bug fixes.
  3. Internal: For features and bug fixes in internal operation, especially processing mode.

To the changelog entry, please add a link to this PR (consider a more descriptive message):

- Use Annotated struct definition for replay-event parsing. ([#1582](https://github.com/getsentry/relay/pull/1582))

If none of the above apply, you can opt out by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against 92c49ea

@cmanallen
Copy link
Member Author

@jan-auer or @jjbayer Having a tough time finishing this one. Trying to figure out the best place to put things and also how all these things fit together.

Goal: normalize the replay event payload and provide data scrubbing functionality that is configurable from sentry.io.

I have a process_replay method defined on the impl for NormalizeProcessor. I'm doing platform normalization there. In actors/processor.rs I'm normalizing the user-agent and the user's ip-address. The ip-address comes from envelope.meta().client_addr();. I'm doing this normalization process in two locations and I'm thinking they should be merged into a single method.

Questions: How should NormalizeProcessor.process_replay be called and should I call it in the process_replays method in actors/processors.rs? How do I access client_addr() inside the normalization processor? Should I even be using the normalization processor?

@jjbayer
Copy link
Member

jjbayer commented Nov 30, 2022

Should I even be using the normalization processor?

@cmanallen haven't reviewed in detail, but I think it might make more sense to write a separate ReplayProcessor (or similar) that implements Processor and normalizes Replay events. It might be possible to reuse the existing PiiProcessor for Pii processing though, by adding a process_replay method to it.

To pass the client IP, you could give your processor a config similar to StoreConfig, which contains client information and is passed into the StoreProcessor:

pub struct StoreConfig {
pub project_id: Option<u64>,
pub client_ip: Option<IpAddr>,

@cmanallen
Copy link
Member Author

cmanallen commented Dec 14, 2022

@jjbayer Updated with PII scrubbing. Question: the PiiProcessor is removing the user's ip-address (as I expect) but its not removing a credit-card in the tags. Is this unsupported or have I failed to define the correct behavior for this?

Edit: also I did not follow this advice:

It might be possible to reuse the existing PiiProcessor for Pii processing though, by adding a process_replay method to it.

I did not know how process_replay would be called and wasn't really sure what to populate it with. Thankfully process_value with the unmodified PiiProcessor seems to partially work. I wonder what else I need to tweak to get to 100%?

@cmanallen
Copy link
Member Author

@jjbayer Also the linter is freaking out because it says my imports are unnecessary but if I remove them the tests break.

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmanallen is this PR ready for final review? Or are you still making changes?

relay-general/src/protocol/replay.rs Outdated Show resolved Hide resolved
relay-general/src/protocol/replay.rs Outdated Show resolved Hide resolved
relay-general/src/protocol/replay.rs Show resolved Hide resolved
relay-general/src/protocol/replay.rs Outdated Show resolved Hide resolved
@cmanallen
Copy link
Member Author

@jjbayer Yes ready for final review! Question: the PII scrubber removes values and replaces them with [email] (for example). On the Snuba side I assume I would want to nullify these values? Do you know what pattern Sentry uses for managing these scrubbed values? I looked at some of the snuba processors and it seems like they don't acknowledge these sentinel values exist. Are there additional steps I need to take to normalize this payload?

@jjbayer
Copy link
Member

jjbayer commented Dec 20, 2022

Do you know what pattern Sentry uses for managing these scrubbed values?

As far as I know, Sentry & Snuba do not treat scrubbed values differently from regular values, i.e. they are still strings, just with a different value. If you'd rather have these values set to null instead of being overwritten, you could look into using the Remove redaction instead of Replace. E.g. by configuring @ip:remove instead of @ip:replace (which is part of @common):

"@ip:remove" => RuleSpec {
ty: RuleType::Ip,
redaction: Redaction::Remove,
};

@cmanallen
Copy link
Member Author

@jjbayer Currently I'm getting the PiiConfig from ProjectConfig? Would using ip@remove require implementing a static PII config?

@jjbayer
Copy link
Member

jjbayer commented Dec 21, 2022

@jjbayer Currently I'm getting the PiiConfig from ProjectConfig? Would using ip@remove require implementing a static PII config?

@cmanallen the short answer is yes.

Do I understand correctly that in this version, replay events are scrubbed with PiiConfig from project config but replay recordings are still scrubbed with a static config? Was it a deliberate product decision to use the one from project configs here? If not, I would apply a static config to both, especially when we need to erase fields to null rather than doing the default scrubbing.

@cmanallen
Copy link
Member Author

@jjbayer We're happy with the configured behavior ([ip] instead of null). Is this ready to be approved?

relay-general/src/protocol/replay.rs Outdated Show resolved Hide resolved
relay-general/src/protocol/replay.rs Outdated Show resolved Hide resolved
relay-general/src/protocol/replay.rs Outdated Show resolved Hide resolved
relay-general/src/protocol/replay.rs Outdated Show resolved Hide resolved
relay-general/src/user_agent.rs Outdated Show resolved Hide resolved
relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved
relay-server/src/actors/processor.rs Outdated Show resolved Hide resolved
fn test_scrub_pii_from_annotated_replay() {
let mut pii_config = PiiConfig::default();
pii_config.applications =
BTreeMap::from([(SelectorSpec::And(vec![]), vec!["@common".to_string()])]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we fixed this test, I did not realize you were planning to read PII config from project configs instead of using a hard-coded one. So it might make sense to change this test back to a pii config which looks like the most common datascrubbing settings coming from sentry:

fn simple_enabled_config() -> DataScrubbingConfig {
DataScrubbingConfig {
scrub_data: true,
scrub_ip_addresses: true,
scrub_defaults: true,
..Default::default()
}
}

But with that, even the credit card scrubbing seems to fail and I'm not sure why. I can investigate later this week.

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Could you please add a PR description including a motivation for this change?

@cmanallen
Copy link
Member Author

@jjbayer Updated description. Thanks for helping me through this marathon of a PR! :D

@cmanallen cmanallen merged commit 6ebaf35 into master Jan 12, 2023
@cmanallen cmanallen deleted the replays-annotated-parse branch January 12, 2023 15:22
jan-auer added a commit that referenced this pull request Jan 18, 2023
* master: (35 commits)
  ref(actix): Migrate ProjectUpstream to `relay_system::Service` (#1727)
  feat(general): Add unknown SessionStatus variant (#1736)
  ref: Convert integration tests about dropping transactions to unit tests (#1720)
  release: 0.8.16
  ci: Skip redundant self-hosted E2E on library release (#1755)
  doc(changelog): Add relevant changes to python changelog (#1753)
  feat(profiling): Add profile context (#1748)
  release: 23.1.0
  profiling(fix): use an unpadded base64 encoding (#1749)
  Revert "feat(replays): Enable PII scrubbing for all organizations" (#1747)
  feat: Switch from base64 to data-encoding (#1743)
  instr(replays): Add timer metric to recording processing (#1742)
  feat(replays): Use Annotated struct definition for replay-event parsing (#1582)
  feat(sessions): Retire session duration metric (#1739)
  feat(general): Scrub all fields with IP address (#1725)
  feat(replays): Enable PII scrubbing for all organizations (#1678)
  chore(project): Add backoff mechanism for fetching projects (#1726)
  feat(profiling): Add new measurement units for profiling (#1732)
  chore(toolchain): update rust to 1.66.1 (#1735)
  ref(actix): Migrate server actor to the "service" arch (#1723)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants