-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Charset conversion mysql-connector-j-8.0.31.jar #26
Comments
the column contains german letters like "Ü". The Problem here seems to be, that the virtual schema adapter gets latin1 fromt he column definition so it expects latin1, the mysql JDBC driver already converts this to utf-8 though. |
We have a similar error message, however it is independent of the mysql driver version. However, going back to the previous virtual schema version ( |
Jep switching back to 9.0.5 of the adapter script solved the issue for me as well |
Many thanks for reporting this issue. Please excuse that I answer only now. We will have a look at it. |
To summarize: version Hi @djamadeus could you please add the result of describing the table? (at least for one of the columns in question of being either ASCII / latin1 / or UTF-8) In Exasol DB SQL you get this from After that I would like to exclude potential effects due to caching the schema description. We can do so by refreshing the virtual schema with Based on that I will add more information. |
@ckunki yes that only happens with version 10.0.1
|
do you want me to test the refresh-flag with the 10 or 9.x adapter? |
Thank you very much for the additional information. Please apply the refresh for the new version 10. |
Command Could you help me to get insight into the perspective of the virtual schema adapter (another component different from ExaLoader)? You can do so by using
|
` Select * from virtual_jointDB_office_test."erstattungen_positionen"; ALTER VIRTUAL SCHEMA virtual_jointDB_office_test REFRESH; Select * from virtual_jointDB_office_test."erstattungen_positionen"; explain virtual Select * from virtual_jointDB_office_test."erstattungen_positionen"; |
when I run the pushdown command: |
If I counted right, the problem is |
descriptiontable with vs-10.0.1-adapter:
|
Exasol version is 7.1.15 btw |
Thank you very much for all the information. I am confident that this will enable us to find the correct answer and will come back in short. |
My pleasure, Good Luck! :) |
Thanks for this information. I counted the same and agree to your suspicion. As If this holds true then the virtual schema adapter cannot influence this behavior. The only thing that still makes me frown is that the current ticket reports a change of behavior from version |
jep. |
Thanks for mentioning that as this constantly goes through my mind, too. What I am still wondering about: How could the information be different from what we get from Would it be OK to ask for two additional log samples?
For (1) the relevant log files have names like
For (2) see https://docs.exasol.com/db/latest/database_concepts/virtual_schema/logging.htm |
How would I access the file from (1) ? running a community version instance here. no root access to that blackbox as far as I know |
Sorry for assuming you might be able to access these logs. In the meantime I could do some deeper experiments and I am close to a proof of our current assumption which is matching your initial diagnosis. VSMYSQL version 3.x inspected only the data to detect the data type and got UTF-8 which is accepted by Exasol DB when importing UTF-8 encoded values. VSMYSQL version 4.x inspects the metadata of the JDBC connection to detect the data type of each database column. In our case this is probably CHAR with octet length 1, i.e. 1 byte per character which is more or less correct for encoding latin1. VSMYSQL maps this to Exasol data type ASCII. When retrieving actual values in a select statement the JDBC driver converts these character to UTF-8 and Exasol database reports an error as it cannot convert these data to data type ASCII. The question is now how to overcome this.
I will ponder these options and discuss with my colleagues. |
Since detecting the type has worked flawlessly in the past, maybe having a
parameter to fall back to the 3.x behaviour is best.
I don't know the background why this was changed. Were there other issues
with the old way, or is it "just" performance related?
Thx for your efforts btw!
|
Thanks for contact information and feeback. There were a number of reasons for the change in VSMYSQL:
Also guessing the data types from the values in the result set appeared to be inappropriate and risky. Although, we must must admit, that in your special case the old behavior seems to be preferable. 😃 In the meanwhile we did some experiments with settings of the connection which did not change the behavior. The bug filed for MySQL Connector J: https://bugs.mysql.com/bug.php?id=69328, submitted: 27 May 2013, status Verified i.e. still open further narrows down the options In summary we are currently focusing option (O4) as proposed by you. |
To ensure correct understanding of the problem I investigated the behavior of the MySQL JDBC driver: CREATE TABLE T (c1 CHAR(1), c2 ENUM('A', 'Ü')) CHARACTER SET latin1;
INSERT INTO T VALUES ('Ü', 'Ü'); The following results have been retrieved by using the MySQL JDBC driver Column Metadata for MySQL table 'T' provided by JDBC driver:
Column descriptions from query I will also enhance the integration tests of VSMYSQL to reproduce this output. |
* updated version, pk fix, artifact references and added documentation * #26: Enabled to use MySQL database with character set `latin1` and characters not strictly ASCII. * updated to latest PR of VSCJDBC * removed repository maven.exasol.com * upgraded to exasol-testcontainers 6.4.0 * excluded vulnerablity Co-authored-by: Christoph Pirkl <[email protected]>
glad to see this fixed. thanks alot! |
btw same some problem occurs with postgres adapter... |
When reading data from virtual schema I get a converison-error from UTF8 to ASCII,. Reading directly from the jdbc-connection works though.
EXA_DB.VIRTUAL_JOINTDB_OFFICE> Select * from(
IMPORT FROM JDBC
AT MYSQL_JOINTDB_JDBC_CONNECTION
STATEMENT 'SELECT * from office.erstattungen_positionen limit 100')
[2022-11-16 10:13:59] 100 rows retrieved starting from 1 in 1 s 85 ms (execution: 1 s 53 ms, fetching: 32 ms)
EXA_DB.VIRTUAL_JOINTDB_OFFICE> Select * from virtual_jointDB_office."erstattungen_positionen" limit 100
[2022-11-16 10:21:58] [42636] ETL-3009: [Column=23 Row=0] [Charset conversion from 'UTF-8' to 'ASCII' failed [Invalid or incomplete multibyte or wide character]] (Session: 1749641517751205888).
On the Source-DB the charset is latin1 (yes i know :(, not my fault^^).
The text was updated successfully, but these errors were encountered: