Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Fix LIKE escape issues #702

Closed
Dandandan opened this issue Dec 23, 2021 · 5 comments · Fixed by #1204
Closed

Fix LIKE escape issues #702

Dandandan opened this issue Dec 23, 2021 · 5 comments · Fixed by #1204
Labels
bug Something isn't working help wanted Extra attention is needed no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog

Comments

@Dandandan
Copy link
Collaborator

Dandandan commented Dec 23, 2021

Due to using regex crate directly, it's sensitive to not escape like patterns. A solution to it can be using the escapefunction inregex`: https://docs.rs/regex/1.5.4/regex/fn.escape.html

Next, there should be a possibility to escape the % and _ characters.

Arrow-rs issues/PRs:

apache/arrow-rs#1085
apache/arrow-rs#1042
apache/arrow-rs#415

@Dandandan Dandandan added the bug Something isn't working label Dec 23, 2021
@jorgecarleitao jorgecarleitao added the help wanted Extra attention is needed label Dec 23, 2021
@jorgecarleitao
Copy link
Owner

I struggle a bit to implement this because the rules are quite specific to the particular dialect. E.g. databend uses Mysql, Datafusion uses postgres, Polars uses python's regex.

Is there an escape that covers them all? Any ideas how we accommodate these differences, @Dandandan @houqp @ritchie46 @sundy-li ?

@Dandandan
Copy link
Collaborator Author

I think it's best to split this into two:

  • (Bug) not escaping regex patterns like .* ., parentheses, etc. That should be easy as the regex crate offers the escape method.
  • (Feature) supporting escaping % and _, e.g. with something like \% and \_, or making it configurable.

@sundy-li
Copy link
Collaborator

databend built regexp pattern by like pattern using escape.
https://github.com/datafuselabs/databend/blob/44bab57d297c0f05440014a5270318bdb4485d8e/common/datavalues/src/arrays/ops/like.rs

Maybe we can introduce LIKE_OPTIONS to cover possible dialects in arrow2.

@daniel-martinez-maqueda-sap
Copy link
Contributor

I have a proposal I've been working on that could work:

  • Escape regex patterns calling escape from the crate regex as it is done at the moment
  • If a like wildcard is escaped, remove the escape characters (so that it is matched when running the regex)

@jorgecarleitao
Copy link
Owner

Really sorry for the late reply! :(

Those makes a lot of sense.

@jorgecarleitao jorgecarleitao added the no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog label Aug 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working help wanted Extra attention is needed no-changelog Issues whose changes are covered by a PR and thus should not be shown in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants