Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL SELECT not suport Unicode #13440

Closed
hongbangsoft opened this issue Mar 19, 2023 · 29 comments
Closed

SQL SELECT not suport Unicode #13440

hongbangsoft opened this issue Mar 19, 2023 · 29 comments
Labels
Milestone

Comments

@hongbangsoft
Copy link

Describe the bug

The current SQL parser does not allow "N'" for Unicode

To Reproduce

Run the following SQL Query
SELECT * FROM ContentItemIndex Where ContentType Like N'%á%'

Result
*Syntax error, expected: ., (, +, -, , /, %, &, |, ^, =, >, <, >=, <=, <>, !=, !<, !>, AND, OR, LIKE, NOT LIKE, NOT, BETWEEN, IN, ,, ), GROUP, HAVING, ORDER, LIMIT, OFFSET, UNION, ;, WITH, SELECT, INNER, LEFT, RIGHT, JOIN, WHERE

@hishamco
Copy link
Member

This is weird, I will look to this

@lampersky
Copy link
Contributor

@hongbangsoft

"N" prefix on string literals is a SQL Server thing. Not sure if any other db engine is using it.
Sql parser is supporting SQL92 standard with some additional features from SQL2003 standard.

Have you tried to perform query without N prefix? I can search for string with unicode characters with no problems:

image

@hishamco
Copy link
Member

@lampersky I remembered N is required especially if you are using nvarchar datatype

@lampersky
Copy link
Contributor

@hishamco
I don't remember what type is used for ContentType column, but for DisplayText it is nvarchar, that's why my example is working.

@hishamco
Copy link
Member

hishamco commented Mar 19, 2023

Anyhow many things will be easier to adapt if we use Parlot :)

@hongbangsoft
Copy link
Author

hongbangsoft commented Mar 20, 2023

@hishamco
Without N prefix the query can run, it work with "á" but it doesn't work with all unicode characters such as "ả".

@hishamco
Copy link
Member

I'm sure it won't work with Arabic letter unless you supplied N, we might need to modify the parser to support such thing

@hishamco
Copy link
Member

@sebastienros can we modify the SqlGrammar to support such thing, or do you have another thought?

@sebastienros
Copy link
Member

I need to understand when SQL Server requires the N prefix first.

@sebastienros
Copy link
Member

From what I am reading, a solution might be to inject N for every string literal in the SQLServer dialect.

@hishamco
Copy link
Member

hishamco commented Mar 21, 2023

I need to understand when SQL Server requires the N prefix first.

This is required one you are using nchar or nvarchar to support unicode characters

From what I am reading, a solution might be to inject N for every string literal in the SQLServer dialect.

I don't think that's enough coz the parser will assume the string literal starts with a single quote, which is wrong in this case. May be adapt the SqlGrammar parser will fix the issue

@sebastienros
Copy link
Member

coz the parser will assume the string literal starts with a single quote

My suggestion is to change the SQL generated for SQL SERVER -> The SQL SERVER DIALECT. So SQL SERVER will have to use unicode for all string literals it parses.

@hishamco
Copy link
Member

Got it but how the SqlParser in OC will deal with N without changing the grammar?

@sebastienros
Copy link
Member

in the SQLServer dialect.

My suggestion is to change the SQL generated for SQL SERVER -> The SQL SERVER DIALECT

Example, these are "SQL SERVER" queries

before: ... WHERE [a] = 'b'
after: ... WHERE [a] = N'b'

There is no change in the grammar or the input SQL, this is "GENERATED SQL SERVER" SQL

@sebastienros
Copy link
Member

Note that I am not sure it will work, but this is my suggestion for now.

@hishamco
Copy link
Member

It's fine at YeSQL sebastienros/yessql#482 but the problem should occur here

var string_literal = new StringLiteral("string", "'", StringOptions.AllowsDoubledQuote);

@sebastienros
Copy link
Member

The parser will parse 'foo', but the "generated" sql query should be N'foo'. We just need to ensure that parsed string literals are rendered using Quote(string) in the dialect, which is the method I changed in the PR.

@hishamco
Copy link
Member

That means the SqlGrammar here is thightly coupled on what YesSQL generated, which I didn't expect when I saw the issue

@sebastienros sebastienros added this to the 1.x milestone Apr 20, 2023
@lahma
Copy link
Contributor

lahma commented May 29, 2023

Also note that if queried column is nvarchar, passing varchar parameter (or the other way around) will cause a conversion operator call for each equality check.

https://stackoverflow.com/questions/4459425/sql-server-query-fast-with-literal-but-slow-with-variable

@sebastienros
Copy link
Member

@lahma isn't that an orthogonal issue that would already apply today if we are not matching the types (haven't checked)? I believe that what I am suggesting is only to let SQL SERVER parse a query expecting unicode chars in the string literal.

@lahma
Copy link
Contributor

lahma commented Jun 8, 2023

It's generally a problem yes if not accounted for. I've tripped on this one with Quartz years ago which caused indexes/primary keys become less efficient as each comparison required conversion function. This was just a general note.

@sebastienros
Copy link
Member

@hishamco
Copy link
Member

hishamco commented Oct 12, 2023

Seb the issue is not with the actual type, rather than supporting the parsing of the N in the value, @hongbangsoft correct me if I'm wrong.

@sebastienros
Copy link
Member

Seb the issue is not with the actual type
I was responding to marko's comment.

@sebastienros
Copy link
Member

@hishamco please try to repro the issue before and after updating yessql. Hopefully the yessql update should fix it. (irrespective of the tests which will need to be updated as you mentioned in your PR)

@hishamco
Copy link
Member

I already reproduced the issue before, I might need to test it in the PR #14491

@hishamco
Copy link
Member

@sebastienros the issue still exists after updating the YesSQL

@sebastienros
Copy link
Member

Thanks, I will check then. Might need another yessql version :/

@sebastienros
Copy link
Member

@lampersky 's explanation is correct. There is no issue. The query syntax is db agnostic, the N prefix is a SQL SERVER concept. Don't use that in this query editor.

However the resulting SQL SERVER query that OC generates might be better with using N because we know that YesSQL creates NVARCHAR columns. This is handled by a new version.

@MikeAlhayek MikeAlhayek modified the milestones: 1.x, 1.8 Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants