Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(events): Skip tags with None keys gracefully #1419

Merged
merged 3 commits into from
Oct 15, 2020
Merged

Conversation

jan-auer
Copy link
Member

In getsentry/relay#808, Relay will start to emit more detailed normalization meta data for tags pointing to the keys and values causing problems. This, in turn, will create events with tag entries [None, "value"], which Snuba cannot handle at the moment.

It is generally valid that Relay emits pair list entries with a None-key. This PR therefor adds validation to skip such entries.

Also see getsentry/sentry#21281 for an example on how the UI shows this meta data.

@jan-auer jan-auer requested a review from a team as a code owner October 13, 2020 20:33
@jan-auer jan-auer self-assigned this Oct 13, 2020
@@ -55,7 +55,7 @@ def _as_dict_safe(value):
return value
rv = {}
for item in value:
if item is not None:
if item is not None and item[0] is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this stored inn Nodestore ? If there is no equivalent sanitization there we would have an inconsistency between the two.

Sanitizing data in snuba is always a problem because snuba is almost the last element of the pipeline. This means that there can be logic that processed or stored the event including the tag with None key. This creates an inconsistency between our different storages.
What is the reason for not getting rid of these tag keys at validation time while this was done before, what use do we plan to make of them?

Said that tags values are already inconsistent since we remove NULL values so ... well ... whatever ...

Copy link
Member Author

@jan-auer jan-auer Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation in Relay is what creates these None values in the first place. Node store saves the resulting Nones along with meta data so that the UI can anchor error messages with stable indexes.

Overall, Snuba and the UI must expect that None can occur at any point in the payload where it is legal JSON. When Relay normalizes values and finds an invalid item, it replaces it with None and leaves an entry in the _meta key. This entry points to the exact path where the erroneous field resides. This is also why tags are given as a list of pairs rather than a dictionary.

An example for this is the following erroneous payload:

{
  "tags": {
    "a": "",
    "": "b", 
    "??": "c", 
    "normal": "tag"
  }
}

Relay normalizes this to the following payload, which is ultimately stored in nodestore and passed to Snuba:

{
  "tags": [
    [null, "b"],
    [null, "c"],
    ["a", null],
    ["normal", "tag"]
  ],
  "_meta": {
    "tags": {
      "0": {
        "0": {
          "": {"err": [["invalid_data", {"reason": "expected a non-empty value"}]]}
        }
      },
      "1": {
        "0": {
          "": {"err": [["invalid_data", {"reason": "invalid characters in string"}]], "val": "??"}
        }
      },
      "2": {
        "1": {
          "": {"err": [["invalid_data", {"reason": "expected a non-empty value"}]]}
        }
      }
    }
  }
}

This, in turn can then be rendered by the UI similar to this screenshot (derived from a slightly different payload, still incomplete error rendering):

image

Copy link
Member Author

@jan-auer jan-auer Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more detailed answers to some of your questions:

What is the reason for not getting rid of these tag keys at validation time while this was done before, what use do we plan to make of them?

We did not do this before due to a bug in normalization, which is now being fixed. There was an unwanted side effect between our schema validation and actual tag normalization. Also note that this is not specific to tags. Pair lists also occur in request cookies and request headers.

This creates an inconsistency between our different storages.

This is the important point, indeed. Since the None values indicate invalid tags, it is safe to argue that Snuba does not need to index them. That is, unless we plan to also make errors searchable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess any possible direction we take here there is some data inconsistency between nodestore, snuba and relay (which shows up to the user). And the fundamental issue is that snuba tags are exposed (though not stored that way) as a map and not as a list of pairs while for relay they are a list of pairs.

Copy link
Contributor

@fpacifici fpacifici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have an alternative at this point since the data structure we use to represent tags in snuba is not the same as the one we use to represent tags outside of snuba.

@jan-auer jan-auer merged commit d723f2e into master Oct 15, 2020
@jan-auer jan-auer deleted the fix/tag-none-keys branch October 15, 2020 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants