Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spans): De-duplicate SQL conditions #2929

Merged
merged 8 commits into from
Jan 11, 2024
Merged

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Jan 10, 2024

SQL queries are sometimes generated with dynamic conditions (.. OR .. OR .. OR ..). After parameter scrubbing, these operands look the same, for example id = %s OR id = %s OR id = %s. In this case, simplify the condition to id = %s.

ref: internal issue

OR (a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s OR ((id = %s OR org = %s))"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gggritso this is not exactly the placeholder you suggested, but it does keep the implementation simple. Let me know if it's good enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me 👍🏻 going to defer to @Zylphrex on this, since he found the original issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me as it covers the case I originally raised.

@jjbayer jjbayer changed the title feat(spans): De-duplicate conditions feat(spans): De-duplicate SQL conditions Jan 10, 2024
@jjbayer jjbayer marked this pull request as ready for review January 10, 2024 17:37
@jjbayer jjbayer requested a review from a team as a code owner January 10, 2024 17:37
@jjbayer jjbayer requested a review from gggritso January 10, 2024 17:38
Copy link
Contributor

@olksdr olksdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question, otherwise lgtm

OR (a.id = %s AND a.org = %s)
OR (a.id = %s AND a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s AND ((id = %s AND org = %s))"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the double parentheses (( left on purpose ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not intend to leave them initially, but they might help indicate to the user that something was scrubbed here. They should be easy to remove though, if we want to @gggritso any opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd personally remove them. I think it's easy to notice queries are getting scrubbed so it's no surprise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinion here 👍🏻

@gggritso
Copy link
Member

@Zylphrex what do you think?

OR (a.id = %s OR a.org = %s)
OR (a.id = %s OR a.org = %s)
)"#,
"SELECT * FROM a WHERE status = %s OR ((id = %s OR org = %s))"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me as it covers the case I originally raised.

Comment on lines -214 to -215
"SELECT count() FROM table1 WHERE id IN (%s, %s) AND id IN (%s, %s, %s)",
"SELECT count() FROM table1 WHERE id IN (%s) AND id IN (%s)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this raises an edge case how queries like this one where we have the same condition twice joined by a AND operator.

Likely an uncommon use case, but the first list may be a subset of the second so it may be valid to transform it into just id IN (%s).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to pretend that they are the same for grouping purposes, with the same reasoning as for the OR clause: If the left operand (id) and the operator IN repeat multiple times, it is likely a dynamically generated query that we can simplify.

@jjbayer jjbayer merged commit da081f4 into master Jan 11, 2024
20 checks passed
@jjbayer jjbayer deleted the feat/spans-dup-condition branch January 11, 2024 14:19
jjbayer added a commit that referenced this pull request Jan 18, 2024
#2929 partially scrubbed
repeated conditions in SQL queries, but it did not consider the case
where the repeated operands of an `OR` or `AND` operation are not
siblings (see tree in code comments).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants