Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spans): initial MongoDB description scrubbing support #3912

Merged
merged 10 commits into from
Sep 4, 2024

Conversation

mjq
Copy link
Member

@mjq mjq commented Aug 8, 2024

We would like to support MongoDB queries in the Queries insight module (which is currently SQL-only).

Metrics will be shown on the Queries insights page when they fulfill the query has:span.description span.module:db. Without the changes in this PR, MongoDB spans would either lack a span.description or span.module so they were hidden from the module.

In addition, to fully support the Queries page, the span.action and span.domain tags are also required. Without this PR, MongoDB spans' span.action was being set if it was present as db.operation in span data, but span.domain was not set.

This PR:

  • Allows MongoDB spans to be tagged with span.module: db
  • Adds MongoDB query scrubbing and writes the scrubbed query to span.description, conditional on the organizations:performance-queries-mongodb-extraction feature flag (added here)
  • Writes the collection name to span.domain for MongoDB spans

Because the new query scrubbing is behind a feature flag, orgs without it will continue to have no span.description and therefore continue to not show up in the Queries module.

The short-term plan is to turn this on for a single test org to iterate on the frontend experience, and also begin to test the cardinality of the query scrubbing.

The scrubbing keeps the operation/collection key/value pair but otherwise replaces leaf values in the query JSON with "?". This might need to be tuned further if there are cases where people are likely to use high-cardinality keys.

Valid MongoDB spans must:

  • contain a JSON-encoded query in their description
  • have a db.system of "mongodb"
  • have either db.collection.name or db.mongodb.collection set
  • have db.operation set

Our JavaScript v8 SDK fulfills these criteria for MongoDB spans, and I believe the latest Python does too (but haven't personally checked).

We would like to support MongoDB queries in the Queries insight module (which
is currently SQL-only).

Metrics will be shown on the Queries insights page when they fulfill the query
`has:span.description span.module:db`. Without the changes in this PR, MongoDB
spans would either lack a `span.description` or `span.module` so they were
hidden from the module.

In addition, to fully support the Queries page, the `span.action` and
`span.domain` tags are also required. Without this PR, MongoDB spans'
`span.action` was being set if it was present as `db.operation` in span `data`,
but `span.domain` was not set.

This PR:

- Allows MongoDB spans to be tagged with `span.module`: `db`
- Adds MongoDB query scrubbing and writes the scrubbed query to
  `span.description`, conditional on the
`organizations:performance-queries-mongodb-extraction` feature flag
- Writes the collection name to `span.domain` for MongoDB spans

Because the new query scrubbing is behind a feature flag, orgs without it will
continue to have no `span.description` and therefore continue to not show up in
the Queries module.

The short-term plan is to turn this on for a single test org to iterate on the
frontend experience, and also begin to test the cardinality of the query
scrubbing.

The scrubbing keeps the operation/collection key/value pair but otherwise
replaces leaf values in the query JSON with `"?"`. This might need to be tuned
further if there are cases where people are likely to use high-cardinality
keys.

Valid MongoDB spans must:
- contain a JSON-encoded query in their `description`
- have a `db.system` of `"mongodb"`
- have either `db.collection.name` or `db.mongodb.collection` set
- have `db.operation` set

Our JavaScript v8 SDK fulfills these criteria for MongoDB spans.

Co-authored-by: Ash <[email protected]>
@mjq mjq requested a review from a team as a code owner August 8, 2024 16:44
Disabled,
/// Enable scrubbing of MongoDB span descriptions.
Enabled,
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this value has to get passed through so many different functions, it seemed clearer to use an explicit enum argument instead of a dangling boolean for the arguments.

///
/// Serialized as `organizations:performance-queries-mongodb-extraction`.
#[serde(rename = "organizations:performance-queries-mongodb-extraction")]
ScrubMongoDBDescriptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this feature make sense the way it is right now?

It seems like mongodb is unconditionally enabled with this PR only the scrubbing is conditional. For me this seems like we should have a feature flag to enable/disable mongodb alltogether not just the scrubbing which is supposed to help us with cardinality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my understanding.

There are two places that Relay currently special-cases MongoDB. The first is disabled databases.

const DISABLED_DATABASES: &[&str] = &[
"*clickhouse*",
"*compile*",
"*mongodb*",
"*redis*",
"db.orm",
];

let is_db = RuleCondition::eq("span.sentry_tags.category", "db")
& !(RuleCondition::eq("span.system", "mongodb")
| RuleCondition::glob("span.op", DISABLED_DATABASES)
| RuleCondition::glob("span.description", MONGODB_QUERIES));

is_db is used to control the presence of certain metrics and tags in hardcoded_span_metrics.

This PR removes that check, which means we will produce more metrics. However, note that:

  • a number of SDKs (including v8 of the JS SDK) don't actually have mongodb as part of the span op, and were never subject to this check
  • high cardinality of the resulting metrics comes from the span.description tag, which is protected against separately (see following)

The second existing special case is in the span description scrubbing.

("db", sub) => {
if sub.contains("clickhouse")
|| sub.contains("mongodb")
|| sub.contains("redis")
|| is_legacy_activerecord(sub, db_system)
|| is_sql_mongodb(description, db_system)
{
None

We check again for mongodb in the op (which as I mentioned, is often already bypassed), but the is_sql_mongodb function catches them all at this point by checking for a db.system of mongodb or JSON-looking description. This is the handling we fall through to when the feature is off.


So:

  • This PR will change the situation from "some MongoDB spans are counted as is_db" to "all MongoDB spans are counted as is_db", but
  • There will be no increase in cardinality, as all MongoDB span descriptions will continue to be None in cases where the flag is not set.

In an ideal world, I would have kept the behaviour exactly the same and only conditionally changed DISABLED_DATABASES. But, looking at how hardcoded_span_metrics, I can't see any way to get feature flags down to it. So I reasoned that the outcomes above were probably acceptable, but I could definitely be wrong! Please let me know what you think. Thanks so much 🙏

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a number of SDKs (including v8 of the JS SDK) don't actually have mongodb as part of the span op, and were never subject to this check

I'd be worried about the Otel integrations here, not the SDKs necessarily.

high cardinality of the resulting metrics comes from the span.description tag, which is protected against separately (see following)

If we have some confidence in the scrubbing (which seems pretty sound to me, we can even start with a lower 'recursion limit'), I'd not use the feature flag at all. You will only find the outliers after it's too late anyways. And the feature flag just adds more conditionals and boilerplate while only toggling scrubbing not the full extraction.
That's the same approach that was taken for Redis.

- replaced copies with in-place modification
- added a recursion limit to the query scrubbing
- used for loops instead of `for_each`
- `to_owned` instead of `to_string`
@mjq
Copy link
Member Author

mjq commented Aug 9, 2024

Thanks for the review @Dav1dde! I appreciate the corrections to my first Rust. I think I addressed all your lower level comments, and left a longer explanation for the high level concern re: the feature flag if you get a chance to check that out. Thanks again! 🙏

@mjq mjq requested a review from Dav1dde August 9, 2024 15:50
@Dav1dde
Copy link
Member

Dav1dde commented Aug 12, 2024

@mjq I took the liberty to update the scrubbing code to be clone free and mutate only in place.

Few things:

  • for_each: is not almost a code smell and can just be expressed with a for
  • collections have a bunch of utility methods, like values_mut which gives you mutable access to only the values/what you need, this also allows you to modify in place
  • you can match on the mutable Value's and modify them in place as well, no need to create new values, just pass &mut references into the visitor
  • there is a small optimization when you already have a Value::String we don't need to allocate a new string, we can clear the string and push a ? into it, this saves an allocation
  • You can early return with the questionmark ? operator if the function returns Result or Option (used on serde_json::from_str)
  • There is an additional optimization we could take, that is not actually parsing the json into a proper data structure, but just visiting and creating a new json immediately (SAX like), the other optimization would be to not parse String's we could just borrow from the original description (since we later re-serialize anyways), but that requires our own Value enum, so decided not to do it for simplicity reasons.

Do you think a depth/recursion limit of 3 is enough? Having no experience with mongodb it does feel quite small.

@Dav1dde Dav1dde force-pushed the mjq-ash-mongodb-queries branch from a7cb870 to 9759b15 Compare August 12, 2024 10:58
@Dav1dde Dav1dde force-pushed the mjq-ash-mongodb-queries branch from 9759b15 to 3bb717b Compare August 12, 2024 10:59
Dav1dde
Dav1dde previously approved these changes Aug 12, 2024
Copy link
Member

@Dav1dde Dav1dde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted I'd consider dropping the feature flag all together, especially since you expect almost no mongodb data in the first place. Just removes a few more conditionals and overhead. Up to you.

Conditionally changing is_db would be preferrable and possible, but that requires some deeper changes in Relay we can always do later and I don't want to block you on this.

///
/// Serialized as `organizations:performance-queries-mongodb-extraction`.
#[serde(rename = "organizations:performance-queries-mongodb-extraction")]
ScrubMongoDBDescriptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a number of SDKs (including v8 of the JS SDK) don't actually have mongodb as part of the span op, and were never subject to this check

I'd be worried about the Otel integrations here, not the SDKs necessarily.

high cardinality of the resulting metrics comes from the span.description tag, which is protected against separately (see following)

If we have some confidence in the scrubbing (which seems pretty sound to me, we can even start with a lower 'recursion limit'), I'd not use the feature flag at all. You will only find the outliers after it's too late anyways. And the feature flag just adds more conditionals and boilerplate while only toggling scrubbing not the full extraction.
That's the same approach that was taken for Redis.

@mjq
Copy link
Member Author

mjq commented Aug 26, 2024

@Dav1dde Thanks for the review (and the help with the in-place data transformations!). With hack week over, I'd like to get this merged. I've re-merged master to get a new test run.

Even though I'm confident in the scrubbing, personally I would feel better about having the feature flag for the initial release. I can loop back and remove it within a day or two afterwards once we're sure things are looking okay. How does that sound?

Do you think a depth/recursion limit of 3 is enough? Having no experience with mongodb it does feel quite small.

My thinking was to start conservative and expand it if we find we need it in real use.

@mjq mjq requested review from Dav1dde and jjbayer August 26, 2024 16:57
@mjq
Copy link
Member Author

mjq commented Aug 29, 2024

@jjbayer added you as a reviewer (along with a re-request to @Dav1dde) if you wouldn't mind helping, since we're blocked on this now. Thanks!

@jjbayer jjbayer self-assigned this Aug 30, 2024
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment about scrubbing collection names. The rest of the PR looks good to me!

///
/// Serialized as `organizations:performance-queries-mongodb-extraction`.
#[serde(rename = "organizations:performance-queries-mongodb-extraction")]
ScrubMongoDBDescriptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +119 to +120
& (RuleCondition::eq("span.system", "mongodb")
| !RuleCondition::glob("span.description", MONGODB_QUERIES));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to parse this condition, might benefit from a comment along the lines of "we disallow mongodb, unless span.system is set to mongodb".

@@ -332,6 +337,7 @@ fn normalize(event: &mut Event, meta: &mut Meta, config: &NormalizationConfig) {
event,
config.max_tag_value_length,
config.span_allowed_hosts,
config.scrub_mongo_description.clone(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can derive the Copy trait on ScrubMongoDescription to make the .clone() unnecessary.

@@ -33,12 +34,22 @@ const MAX_EXTENSION_LENGTH: usize = 10;
/// Domain names that are preserved during scrubbing
const DOMAIN_ALLOW_LIST: &[&str] = &["localhost"];

#[derive(PartialEq, Clone, Debug)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(PartialEq, Clone, Debug)]
#[derive(PartialEq, Clone, Copy, Debug)]

.and_then(|data| data.db_collection_name.value())
.and_then(|collection| collection.as_str());

command.zip(collection).and_then(|(command, collection)| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
command.zip(collection).and_then(|(command, collection)| {
if let (Some(command), Some(collection)) = (command, collection) {

for value in root.values_mut() {
scrub_mongodb_visit_node(value, 3);
}
root.insert(command, Value::String(collection));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: query might not parse into a valid object, so I would pass &str into this function and only convert command and collection to String in this line.


mongodb_scrubbing_test!(
mongodb_basic_query,
"{\"find\": \"documents\", \"showRecordId\": true}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"{\"find\": \"documents\", \"showRecordId\": true}",
r#"{"find": "documents", "showRecordId": true}"#,

etc,

.and_then(|command| command.as_str());

let collection = data
.and_then(|data| data.db_collection_name.value())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With SQL, we learned the hard way that collections / tables can contain all kinds of identifiers (see test cases here). IMO we should pass the mongodb collection name through the same scrubbing as SQL table names to catch those identifers, i.e. apply the TABLE_NAME_REGEX.

mjq added 2 commits September 3, 2024 11:27
- Rename `ScrubMongoDBDescriptions` to `ScrubMongoDbDescriptions`
- comment complicated is_db conditional
- add `Copy` trait to `ScrubMongoDescription`
- clarify option checking (zip -> conditional)
- move `String` conversion after failure cases
- use r# strings to make embedded JSON easier to read
- scrub collection names from descriptions and tags
@mjq
Copy link
Member Author

mjq commented Sep 3, 2024

@jjbayer Thanks so much for the thorough review 🙏 I've made all of your suggested changes. Please take a look at the collection name scrubbing changes in particular. I've applied it both places the collection is used in a tag (the description and the domain) since either could affect metric cardinality. To reuse the TABLE_NAME_REGEX I moved it up to the closest ancestor crate of all its users - if you have any suggestions for a better spot for it (or a better design) please let me know.

Thanks again!

@mjq mjq requested a review from jjbayer September 3, 2024 18:05
Comment on lines +574 to +579
let scrubbed_collection_name =
if let Cow::Owned(s) = TABLE_NAME_REGEX.replace_all(collection, "{%s}") {
s
} else {
collection.to_owned()
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is the same

Suggested change
let scrubbed_collection_name =
if let Cow::Owned(s) = TABLE_NAME_REGEX.replace_all(collection, "{%s}") {
s
} else {
collection.to_owned()
};
let scrubbed_collection_name = TABLE_NAME_REGEX.replace_all(collection, "{%s}").to_owned();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above returns the type error:

   --> relay-event-normalization/src/normalize/span/description/mod.rs:575:51
    |
575 |     root.insert(command.to_owned(), Value::String(scrubbed_collection_name));
    |                                     ------------- ^^^^^^^^^^^^^^^^^^^^^^^^- help: try using a conversion method: `.to_string()`
    |                                     |             |
    |                                     |             expected `String`, found `Cow<'_, str>`
    |                                     arguments to this enum variant are incorrect
    |
    = note: expected struct `std::string::String`
                 found enum `std::borrow::Cow<'_, str>`

But

let scrubbed_collection_name = TABLE_NAME_REGEX.replace_all(collection, "{%s}").to_string();

compiles, if that seems reasonable?

Either way I'm going to merge without this change just in case, and loop back afterwards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that works!

Comment on lines +648 to +653
if let Cow::Owned(s) = TABLE_NAME_REGEX.replace_all(db_collection, "{%s}") {
s
} else {
db_collection.to_owned()
}
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about to_owned.

It's a bit unfortunate that we scrub the collection name again here, but that's a basic flaw of extract_tags (it works on each tag separately instead of collecting all tags for a given span type at once).

@mjq mjq merged commit 9c5f180 into master Sep 4, 2024
23 checks passed
@mjq mjq deleted the mjq-ash-mongodb-queries branch September 4, 2024 15:04
jan-auer added a commit that referenced this pull request Sep 6, 2024
* master: (27 commits)
  build: Update dialoguer and hostname (#4009)
  build: Update opentelemetry-proto to 0.7.0 (#4000)
  build: Update lru to 0.12.4 (#4008)
  build: Update cookie to 0.18.1 (#4007)
  feat(spans): Extract standalone CLS span metrics and performance score (#3988)
  build: Update cadence to 1.4.0 and statsdproxy to 0.2.0 (#4005)
  build: Update maxminddb to 0.24.0 (#4003)
  build: Update multer to 3.1.0 (#4002)
  build: Update regex and aho-corasick (#4001)
  build: Update sentry-kafka-schemas to 1.0.107 (#3999)
  build: Update dev-dependencies (#3998)
  build: Update itertools to 0.13.0 (#3993)
  build: Update brotli, zstd, flate2 (#3996)
  build: Update rdkafka to 0.36.2 (#3995)
  build: Update tikv-jemallocator to 0.6.0 (#3994)
  build: Update minidump to 0.22.0 (#3992)
  build: Update bindgen to 0.70.1 (#3991)
  build: Update chrono to 0.4.38 (#3990)
  feat(spans): initial MongoDB description scrubbing support (#3912)
  fix(spooler): Reduce number of disk reads (#3983)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants