Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spans): De-duplicate SQL conditions #2929

Merged
merged 8 commits into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

**Features**:

- Group db spans with repeating logical conditions together. ([#2929](https://github.com/getsentry/relay/pull/2929))

**Bug Fixes**:

- Normalize event timestamps before validating them, fixing cases where Relay would drop valid events with reason "invalid_transaction". ([#2878](https://github.com/getsentry/relay/pull/2878))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,8 @@ mod tests {

scrub_sql_test!(
various_parameterized_ins_percentage,
"SELECT count() FROM table1 WHERE id IN (%s, %s) AND id IN (%s, %s, %s)",
"SELECT count() FROM table1 WHERE id IN (%s) AND id IN (%s)"
Comment on lines -214 to -215
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this raises an edge case how queries like this one where we have the same condition twice joined by a AND operator.

Likely an uncommon use case, but the first list may be a subset of the second so it may be valid to transform it into just id IN (%s).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to pretend that they are the same for grouping purposes, with the same reasoning as for the OR clause: If the left operand (id) and the operator IN repeat multiple times, it is likely a dynamically generated query that we can simplify.

"SELECT count() FROM table1 WHERE id1 IN (%s, %s) AND id2 IN (%s, %s, %s)",
"SELECT count() FROM table1 WHERE id1 IN (%s) AND id2 IN (%s)"
);

scrub_sql_test!(
Expand Down Expand Up @@ -698,6 +698,46 @@ mod tests {
"UPDATE tbl SET foo = CASE WHEN .. THEN .. END"
);

scrub_sql_test!(
duplicate_conditions,
r#"SELECT *
FROM a
WHERE a.status = %s
AND (
(a.id = %s AND a.org = %s)
OR (a.id = %s AND a.org = %s)
OR (a.id = %s AND a.org = %s)
OR (a.id = %s AND a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s AND ((id = %s AND org = %s))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the double parentheses (( left on purpose ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not intend to leave them initially, but they might help indicate to the user that something was scrubbed here. They should be easy to remove though, if we want to @gggritso any opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally remove them. I think it's easy to notice queries are getting scrubbed so it's no surprise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion here 👍🏻

);

scrub_sql_test!(
duplicate_conditions_or,
r#"SELECT *
FROM a
WHERE a.status = %s
OR (
(a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s OR ((id = %s OR org = %s))"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggritso this is not exactly the placeholder you suggested, but it does keep the implementation simple. Let me know if it's good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me 👍🏻 going to defer to @Zylphrex on this, since he found the original issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me as it covers the case I originally raised.

);

scrub_sql_test!(
non_duplicate_conditions,
r#"SELECT *
FROM a
WHERE a.status = %s
AND (
(a.id = %s AND a.org2 = %s)
OR (a.id = %s AND a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s AND ((id = %s AND org2 = %s) OR (id = %s AND org = %s))"
);

scrub_sql_test!(
unique_alias,
"SELECT pg_advisory_unlock(%s, %s) AS t0123456789abcdef",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ use itertools::Itertools;
use once_cell::sync::Lazy;
use regex::Regex;
use sqlparser::ast::{
AlterTableOperation, Assignment, CloseCursor, ColumnDef, Expr, Ident, ObjectName, Query,
Select, SelectItem, SetExpr, Statement, TableAlias, TableConstraint, TableFactor,
UnaryOperator, Value, VisitMut, VisitorMut,
AlterTableOperation, Assignment, BinaryOperator, CloseCursor, ColumnDef, Expr, Ident,
ObjectName, Query, Select, SelectItem, SetExpr, Statement, TableAlias, TableConstraint,
TableFactor, UnaryOperator, Value, VisitMut, VisitorMut,
};
use sqlparser::dialect::{Dialect, GenericDialect};

Expand Down Expand Up @@ -268,10 +268,25 @@ impl VisitorMut for NormalizeVisitor {
fn post_visit_expr(&mut self, expr: &mut Expr) -> ControlFlow<Self::Break> {
// Casts are omitted for simplification. Because we replace the entire expression,
// the replacement has to occur *after* visiting its children.
if let Expr::Cast { expr: inner, .. } = expr {
let mut swapped = Expr::Value(Value::Null);
std::mem::swap(&mut swapped, inner);
*expr = swapped;
match expr {
Expr::Cast { expr: inner, .. } => {
let mut swapped = Expr::Value(Value::Null);
std::mem::swap(&mut swapped, inner);
*expr = swapped;
}
Expr::BinaryOp {
ref mut left,
op: BinaryOperator::Or | BinaryOperator::And,
ref right,
} => {
let is_equal = left == right;
if is_equal {
let mut swapped = Expr::Value(Value::Null);
std::mem::swap(&mut swapped, left.as_mut());
*expr = swapped;
}
}
_ => (),
}

self.current_expr_depth = self.current_expr_depth.saturating_sub(1);
Expand Down
Loading