fix(events): Skip tags with None keys gracefully #1419

jan-auer · 2020-10-13T20:33:41Z

In getsentry/relay#808, Relay will start to emit more detailed normalization meta data for tags pointing to the keys and values causing problems. This, in turn, will create events with tag entries [None, "value"], which Snuba cannot handle at the moment.

It is generally valid that Relay emits pair list entries with a None-key. This PR therefor adds validation to skip such entries.

Also see getsentry/sentry#21281 for an example on how the UI shows this meta data.

fpacifici · 2020-10-13T21:04:52Z

snuba/processor.py

@@ -55,7 +55,7 @@ def _as_dict_safe(value):
        return value
    rv = {}
    for item in value:
-        if item is not None:
+        if item is not None and item[0] is not None:


How is this stored inn Nodestore ? If there is no equivalent sanitization there we would have an inconsistency between the two.

Sanitizing data in snuba is always a problem because snuba is almost the last element of the pipeline. This means that there can be logic that processed or stored the event including the tag with None key. This creates an inconsistency between our different storages.
What is the reason for not getting rid of these tag keys at validation time while this was done before, what use do we plan to make of them?

Said that tags values are already inconsistent since we remove NULL values so ... well ... whatever ...

Validation in Relay is what creates these None values in the first place. Node store saves the resulting Nones along with meta data so that the UI can anchor error messages with stable indexes.

Overall, Snuba and the UI must expect that None can occur at any point in the payload where it is legal JSON. When Relay normalizes values and finds an invalid item, it replaces it with None and leaves an entry in the _meta key. This entry points to the exact path where the erroneous field resides. This is also why tags are given as a list of pairs rather than a dictionary.

An example for this is the following erroneous payload:

{ "tags": { "a": "", "": "b", "??": "c", "normal": "tag" } }

Relay normalizes this to the following payload, which is ultimately stored in nodestore and passed to Snuba:

{ "tags": [ [null, "b"], [null, "c"], ["a", null], ["normal", "tag"] ], "_meta": { "tags": { "0": { "0": { "": {"err": [["invalid_data", {"reason": "expected a non-empty value"}]]} } }, "1": { "0": { "": {"err": [["invalid_data", {"reason": "invalid characters in string"}]], "val": "??"} } }, "2": { "1": { "": {"err": [["invalid_data", {"reason": "expected a non-empty value"}]]} } } } } }

This, in turn can then be rendered by the UI similar to this screenshot (derived from a slightly different payload, still incomplete error rendering):

A few more detailed answers to some of your questions:

What is the reason for not getting rid of these tag keys at validation time while this was done before, what use do we plan to make of them?

We did not do this before due to a bug in normalization, which is now being fixed. There was an unwanted side effect between our schema validation and actual tag normalization. Also note that this is not specific to tags. Pair lists also occur in request cookies and request headers.

This creates an inconsistency between our different storages.

This is the important point, indeed. Since the None values indicate invalid tags, it is safe to argue that Snuba does not need to index them. That is, unless we plan to also make errors searchable.

Well, I guess any possible direction we take here there is some data inconsistency between nodestore, snuba and relay (which shows up to the user). And the fundamental issue is that snuba tags are exposed (though not stored that way) as a map and not as a list of pairs while for relay they are a list of pairs.

fpacifici

I don't think we have an alternative at this point since the data structure we use to represent tags in snuba is not the same as the one we use to represent tags outside of snuba.

fix(events): Skip tags with None keys gracefully

e3aa491

jan-auer requested a review from a team as a code owner October 13, 2020 20:33

jan-auer self-assigned this Oct 13, 2020

Merge branch 'master' into fix/tag-none-keys

e037e12

jan-auer mentioned this pull request Oct 13, 2020

ref(tags): Fix custom tags issue getsentry/sentry#21281

Merged

fpacifici reviewed Oct 13, 2020

View reviewed changes

jan-auer mentioned this pull request Oct 14, 2020

fix(store): Improve tag normalization getsentry/relay#808

Merged

fpacifici approved these changes Oct 14, 2020

View reviewed changes

Merge branch 'master' into fix/tag-none-keys

c54f29b

jan-auer merged commit d723f2e into master Oct 15, 2020

jan-auer deleted the fix/tag-none-keys branch October 15, 2020 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(events): Skip tags with None keys gracefully #1419

fix(events): Skip tags with None keys gracefully #1419

jan-auer commented Oct 13, 2020

fpacifici Oct 13, 2020

jan-auer Oct 14, 2020 •

edited

Loading

jan-auer Oct 14, 2020 •

edited

Loading

fpacifici Oct 14, 2020

fpacifici left a comment

fix(events): Skip tags with None keys gracefully #1419

fix(events): Skip tags with None keys gracefully #1419

Conversation

jan-auer commented Oct 13, 2020

fpacifici Oct 13, 2020

Choose a reason for hiding this comment

jan-auer Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

jan-auer Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

fpacifici Oct 14, 2020

Choose a reason for hiding this comment

fpacifici left a comment

Choose a reason for hiding this comment

jan-auer Oct 14, 2020 •

edited

Loading

jan-auer Oct 14, 2020 •

edited

Loading