Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spans): De-duplicate SQL conditions #2929

Merged
merged 8 commits into from
Jan 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

**Features**:

- Group db spans with repeating logical conditions together. ([#2929](https://github.com/getsentry/relay/pull/2929))

**Bug Fixes**:

- Normalize event timestamps before validating them, fixing cases where Relay would drop valid events with reason "invalid_transaction". ([#2878](https://github.com/getsentry/relay/pull/2878))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -211,8 +211,8 @@ mod tests {

scrub_sql_test!(
various_parameterized_ins_percentage,
"SELECT count() FROM table1 WHERE id IN (%s, %s) AND id IN (%s, %s, %s)",
"SELECT count() FROM table1 WHERE id IN (%s) AND id IN (%s)"
Comment on lines -214 to -215
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this raises an edge case how queries like this one where we have the same condition twice joined by a AND operator.

Likely an uncommon use case, but the first list may be a subset of the second so it may be valid to transform it into just id IN (%s).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to pretend that they are the same for grouping purposes, with the same reasoning as for the OR clause: If the left operand (id) and the operator IN repeat multiple times, it is likely a dynamically generated query that we can simplify.

"SELECT count() FROM table1 WHERE id1 IN (%s, %s) AND id2 IN (%s, %s, %s)",
"SELECT count() FROM table1 WHERE id1 IN (%s) AND id2 IN (%s)"
);

scrub_sql_test!(
Expand Down Expand Up @@ -698,6 +698,46 @@ mod tests {
"UPDATE tbl SET foo = CASE WHEN .. THEN .. END"
);

scrub_sql_test!(
duplicate_conditions,
r#"SELECT *
FROM a
WHERE a.status = %s
AND (
(a.id = %s AND a.org = %s)
OR (a.id = %s AND a.org = %s)
OR (a.id = %s AND a.org = %s)
OR (a.id = %s AND a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s AND (id = %s AND org = %s)"
);

scrub_sql_test!(
duplicate_conditions_or,
r#"SELECT *
FROM a
WHERE a.status = %s
OR (
(a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s OR (id = %s OR org = %s)"
);

scrub_sql_test!(
non_duplicate_conditions,
r#"SELECT *
FROM a
WHERE a.status = %s
AND (
(a.id = %s AND a.org2 = %s)
OR (a.id = %s AND a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s AND ((id = %s AND org2 = %s) OR (id = %s AND org = %s))"
);

scrub_sql_test!(
unique_alias,
"SELECT pg_advisory_unlock(%s, %s) AS t0123456789abcdef",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ use itertools::Itertools;
use once_cell::sync::Lazy;
use regex::Regex;
use sqlparser::ast::{
AlterTableOperation, Assignment, CloseCursor, ColumnDef, Expr, Ident, ObjectName, Query,
Select, SelectItem, SetExpr, Statement, TableAlias, TableConstraint, TableFactor,
UnaryOperator, Value, VisitMut, VisitorMut,
AlterTableOperation, Assignment, BinaryOperator, CloseCursor, ColumnDef, Expr, Ident,
ObjectName, Query, Select, SelectItem, SetExpr, Statement, TableAlias, TableConstraint,
TableFactor, UnaryOperator, Value, VisitMut, VisitorMut,
};
use sqlparser::dialect::{Dialect, GenericDialect};

Expand Down Expand Up @@ -268,10 +268,26 @@ impl VisitorMut for NormalizeVisitor {
fn post_visit_expr(&mut self, expr: &mut Expr) -> ControlFlow<Self::Break> {
// Casts are omitted for simplification. Because we replace the entire expression,
// the replacement has to occur *after* visiting its children.
if let Expr::Cast { expr: inner, .. } = expr {
let mut swapped = Expr::Value(Value::Null);
std::mem::swap(&mut swapped, inner);
*expr = swapped;
match expr {
Expr::Cast { expr: inner, .. } => {
*expr = take_expr(inner);
}
Expr::BinaryOp {
ref mut left,
op: BinaryOperator::Or | BinaryOperator::And,
ref right,
} => {
let is_equal = left == right;
if is_equal {
*expr = take_expr(left);
}
}
Expr::Nested(inner) if matches!(inner.as_ref(), &Expr::Nested(_)) => {
// Remove multiple levels of parentheses.
// These can occur because of the binary op reduction above.
*expr = take_expr(inner);
}
_ => (),
}

self.current_expr_depth = self.current_expr_depth.saturating_sub(1);
Expand Down Expand Up @@ -420,6 +436,15 @@ impl VisitorMut for NormalizeVisitor {
}
}

/// Get ownership of an `Expr` and leave a NULL value in its place.
///
/// We cannot use [`std::mem::take`], because [`Expr`] does not implement [`Default`].
fn take_expr(expr: &mut Expr) -> Expr {
let mut swapped = Expr::Value(Value::Null);
std::mem::swap(&mut swapped, expr);
swapped
}

/// An extension of an SQL dialect that accepts `?`, `%s`, `:c0` as valid input.
#[derive(Debug)]
struct DialectWithParameters(Box<dyn Dialect>);
Expand Down