fix(metrics): Light normalize before extracting metrics [INGEST-1517 INGEST-1424] #1366

iker-barriocanal · 2022-07-29T14:46:40Z

Context

Currently, all relay modes extract metrics from transactions. However, only processing relays are performing normalization operations that sample out events and modify the values that relay extracts metrics from, in the store normalization step. Not sampling out events early enough also introduces the consequence of extracting metrics from transactions that are discarded a bit later.

What this PR does

Starting from envelopes.rs, there's a new function light_normalize_event to perform minimal event normalization to filter them later, and that minimal normalization has been extracted from the processors.

The changes in the processors are mostly refactors extracting functions to make them callable from the outside (from light_normalize_event, in this case). The normalization processor doesn't re-do the normalization work that was previously done in the light processing step.

An alternative to all this extraction into functions is to have another processor. This was discarded because of its complexity. No additional design choices have been considered.

Other considerations

Performing light normalization in non-processing relays impacts any test depending on the amount of time relay takes to process an envelope. The cause is the long amount of time relay takes to initialize lazily the regex to parse user agents. Since that work is only done once, it doesn't impact relay instances on production.

jjbayer

This looks good overall! I did not yet verify whether everything that we need for inbound filters and / or metrics extraction is now actually part of light_normalize_event.

relay-general/src/store/normalize.rs

jjbayer · 2022-08-01T10:12:51Z

relay-general/src/store/normalize.rs

        }
    }
 }

+pub fn light_normalize_event(


nit: We should probably come up with a better name for this function, but I don't have one yet. Maybe pre_normalize_event?

I hard agree "light normalization" isn't descriptive enough. I don't like pre_normalize_* because that indicates the operation happens before normalization, and this function actually performs some normalization. I think that some is what causes troubles for the naming. I'd also like to avoid referring to any filtering in the name, if possible -- the function may be used in another context in the future and shouldn't be tied up to current behavior.

What about something like minimal_normalize_event? That indicates there's some normalization happening, it doesn't perform the whole normalization, and the explanation of what it does and the motivation behind it could be added to the docstrings. What do you think?

relay-server/src/actors/envelopes.rs

jjbayer · 2022-08-01T10:15:44Z

relay-server/src/actors/envelopes.rs

@@ -1882,6 +1912,8 @@ impl EnvelopeProcessor {

            self.finalize_event(state)?;

+            self.light_normalize_event(state)?;
+


With this, we can remove a lot of ad-hoc normalization from extract_transaction_metrics. But that can be a separate PR.

Are we sure that double-normalizing (such as computing the breakdowns) doesn't break anything now?

relay-general/src/store/normalize.rs

relay-general/src/store/transactions.rs

untitaker

please check out sentry, do pip install -e ~/projects/relay/py/ and run its tests. I think you may break some tests by moving normalization out of the store normalizer, which we use in Python to fully normalize events.

It may be easier to introduce a flag on StoreNormalizer to opt into "light normalization", but it all depends on how many tests you're going to break.

jjbayer · 2022-08-02T07:51:49Z

I think you may break some tests by moving normalization out of the store normalizer, which we use in Python to fully normalize events.

@untitaker This is a good point, should we just re-do everything in store normalization, and for the time being accept the additional cost for the sake of correctness?

relay-general/src/store/transactions.rs

relay-general/src/store/normalize.rs

also makes compiler happy

iker-barriocanal

There's a decent amount of stuff added to the output of integration tests of security reports. By looking at the code, all changes seem they make sense to me, so I assume they weren't running before.

iker-barriocanal · 2022-08-03T23:51:11Z

tests/integration/test_security_report.py

+    if "received" in event:
+        event.pop("received")
+    if "timestamp" in event:
+        event.pop("timestamp")
+


At least in the expect_ct case the event has the received and timestamp fields. I haven't thought if we want them, but I don't think this is that important right now and I took the short and hacky path deleting these fields here. We may want to revisit this.

I think this is fine, just a result of timestamp normalization.

jjbayer · 2022-08-04T07:07:56Z

tests/integration/test_security_report.py

@@ -46,7 +46,7 @@ def test_uses_origins(mini_sentry, relay, json_fixture_provider, allowed_origins
    )

    if should_be_allowed:
-        mini_sentry.captured_events.get(timeout=1).get_event()
+        mini_sentry.captured_events.get(timeout=10).get_event()


Note: This is necessary because the non-processing relay now does normalization, which includes UA parsing, which apparently has slow initialization.

CHANGELOG.md

relay-general/src/store/normalize.rs

relay-general/src/store/mod.rs

jan-auer

I believe, we need to run the SchemaProcessor at the very least since that enforces things like: max_chars, required, trim_whitespace, and nonempty. In store normalization, the schema processor runs before the normalizer, which means that it can change values in such a way that the normalizer behaves differently.

relay-general/src/store/mod.rs

jan-auer · 2022-08-04T09:13:02Z

relay-server/src/actors/envelopes.rs

@@ -1882,6 +1912,8 @@ impl EnvelopeProcessor {

            self.finalize_event(state)?;

+            self.light_normalize_event(state)?;
+


Are we sure that double-normalizing (such as computing the breakdowns) doesn't break anything now?

jan-auer · 2022-08-04T09:15:51Z

relay-general/src/store/normalize.rs

+        })?;
+
+        // Default required attributes, even if they have errors
+        normalize_release_dist(event); // dist is a tag extracted along with other metrics from transactions


nit: Do we need all these comments here?

I asked for these comments because I figured it might be helpful to know in the future why we decided to pull those into "light" normalization.

jjbayer

Feel free to ignore nitpick.

relay-general/src/store/normalize.rs

* master: fix(metrics): Light normalize before extracting metrics (#1366)

Follow-up to #1366. By applying `InvalidTransaction`, we do not report these dropped envelopes to Sentry and apply a dedicated reason code. The store normalizer returns an error for invalid transactions, which has to be special cased. Other than that, the store normalizer is infallible.

github-actions · 2022-08-10T10:33:12Z

	Fails
🚫	Please consider adding a changelog entry for the next release.

Instructions and example for changelog

For changes exposed to the Python package, please add an entry to py/CHANGELOG.md. This includes, but is not limited to event normalization, PII scrubbing, and the protocol.

For changes to the Relay server, please add an entry to CHANGELOG.md under the following heading:

Features: For new user-visible functionality.
Bug Fixes: For user-visible bug fixes.
Internal: For features and bug fixes in internal operation, especially processing mode.

To the changelog entry, please add a link to this PR (consider a more descriptive message):

- Light normalize before extracting metrics. ([#1366](https://github.com/getsentry/relay/pull/1366))

If none of the above apply, you can opt out by adding #skip-changelog to the PR description.

Generated by 🚫 dangerJS against c646b5a

The `normalize` in the python package and C-ABI receives a structure with config arguments for the various normalizers. The `renormalize` flag is more central, as it is supposed to disable most of the normalizers. This is used in Sentry after the event has passed normalization to ensure that it is still structurally valid, but it may contain fields now that are prohibited during ingestion. In #1366 large parts of normalization were split into a dedicated `light_normalize` function which did not honor the renormalize flag. This PR restores correct behavior and disables the entire renormalize call. Co-authored-by: Jan Michael Auer <[email protected]>

iker-barriocanal added 18 commits July 29, 2022 16:03

extract security reports to light normalization

c5af255

extract ip addr normalization to light normalization

5c0b5fa

move environment verification to light normalization

95c8dd7

add _ to unused var

4bbe7c3

move release verification to light normalization

b386967

resort release <> environment check

1d10884

tmp: update reqs comment

72123ba

move release normalization to light normalization

fad125d

move timestamp normalization to light normalization

75c7308

move tag normalization to light normalization

b9fdaf8

move exception normalization to light normalization

a44a1ff

move user agent normalization to light normalization

3920660

move measurement normalization to light normalization

3632ea1

move breakdown normalization to light normalization

8942e61

extract transaction validation to fn from processor

2e0ab34

validate transaction before light normalization

85251a6

delete not needed comments

506b2b8

add temporary return types in main light normalization fn

00bcfea

iker-barriocanal self-assigned this Jul 29, 2022

iker-barriocanal added 2 commits August 1, 2022 10:58

return processing errors on light normalization

c6a6f6a

remove empty lines

1d28375

jjbayer reviewed Aug 1, 2022

View reviewed changes

untitaker reviewed Aug 1, 2022

View reviewed changes

add comments why each light normalization step is needed

0f9d85c

jjbayer reviewed Aug 2, 2022

View reviewed changes

relay-general/src/store/transactions.rs Show resolved Hide resolved

relay-general/src/store/normalize.rs Outdated Show resolved Hide resolved

add light normalization step to unit tests

e730e41

jan-auer mentioned this pull request Aug 2, 2022

feat(transactions): Extract computed measurements [INGEST-1530] #1373

Merged

iker-barriocanal added 2 commits August 2, 2022 16:35

add light normalization to the c abi

d9c961c

fix? cabi relay_pii_strip_event

25a9800

iker-barriocanal added 2 commits August 4, 2022 01:07

move test import to test section

c805598

also makes compiler happy

fix linting

5f73f5d

iker-barriocanal commented Aug 3, 2022

View reviewed changes

iker-barriocanal marked this pull request as ready for review August 4, 2022 00:03

iker-barriocanal requested review from a team, untitaker and jjbayer August 4, 2022 00:03

jjbayer approved these changes Aug 4, 2022

View reviewed changes

improve changelog message

b975e2d

jan-auer reviewed Aug 4, 2022

View reviewed changes

iker-barriocanal added 2 commits August 4, 2022 11:37

qualify some functions at use site

1cc4a7f

move schema processor to store normalization

6444578

jjbayer mentioned this pull request Aug 4, 2022

feat(filters): Filter events in external Relays [INGEST-1517] #1379

Merged

iker-barriocanal added 3 commits August 4, 2022 13:52

Move light normalization params into config struct

bb65d92

add light normalization idempotency test

5701eba

fix lint

c646b5a

jjbayer approved these changes Aug 4, 2022

View reviewed changes

relay-general/src/store/normalize.rs Show resolved Hide resolved

untitaker approved these changes Aug 4, 2022

View reviewed changes

iker-barriocanal merged commit f7a577e into master Aug 4, 2022

iker-barriocanal deleted the iker/fix/filter-before-mep-extract branch August 4, 2022 13:13

jan-auer added a commit that referenced this pull request Aug 4, 2022

Merge branch 'master' into ref/separate-arbiters

b6c1534

* master: fix(metrics): Light normalize before extracting metrics (#1366)

jan-auer mentioned this pull request Aug 4, 2022

fix(store): Apply the correct outcome for light normalization #1382

Merged

jan-auer mentioned this pull request Oct 25, 2022

fix(py): Respect the renormalize flag #1548

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(metrics): Light normalize before extracting metrics [INGEST-1517 INGEST-1424] #1366

fix(metrics): Light normalize before extracting metrics [INGEST-1517 INGEST-1424] #1366

iker-barriocanal commented Jul 29, 2022 •

edited

Loading

jjbayer left a comment

jjbayer Aug 1, 2022

iker-barriocanal Aug 2, 2022

jjbayer Aug 1, 2022

jan-auer Aug 4, 2022

untitaker left a comment

jjbayer commented Aug 2, 2022

iker-barriocanal left a comment

iker-barriocanal Aug 3, 2022

jjbayer Aug 4, 2022

jjbayer Aug 4, 2022

jan-auer left a comment

jan-auer Aug 4, 2022

jan-auer Aug 4, 2022 •

edited

Loading

jjbayer Aug 4, 2022

jjbayer left a comment

github-actions bot commented Aug 10, 2022

		@@ -1882,6 +1912,8 @@ impl EnvelopeProcessor {

		self.finalize_event(state)?;

		self.light_normalize_event(state)?;

fix(metrics): Light normalize before extracting metrics [INGEST-1517 INGEST-1424] #1366

fix(metrics): Light normalize before extracting metrics [INGEST-1517 INGEST-1424] #1366

Conversation

iker-barriocanal commented Jul 29, 2022 • edited Loading

Context

What this PR does

Other considerations

jjbayer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

untitaker left a comment

Choose a reason for hiding this comment

jjbayer commented Aug 2, 2022

iker-barriocanal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jan-auer Aug 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjbayer left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 10, 2022

iker-barriocanal commented Jul 29, 2022 •

edited

Loading

jan-auer Aug 4, 2022 •

edited

Loading