Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[StaplesDataTest] fail with exasol_odbc because of wrong sort order #871

Closed
kaklakariada opened this issue Oct 8, 2021 · 3 comments
Closed

Comments

@kaklakariada
Copy link

About You:
Name: Christoph Pirkl
Company: @exasol

Environment:
Tableau Desktop for Windows, version 2021.3
Windows 2019 Server

Your question:
When working on the Exasol JDBC Connector (exasol/tableau-connector#23) I also fixed some issues with the ODBC Connector and activated additional TDVT tests, especially [StaplesDataTest]. Two of the added tests are failing:

  • staples_data.employee_name
  • staples_data.product_name

Both fail because the sort order expected by the tests differs from the actual results. The same tests run successfully for the JDBC Connector against the same database.

The diff for the first 100 results of staples_data.employee_name suggests that the issues could be caused by special characters, in this case the í in Brígida Campos, because the names before where in the correct order:

$ diff employee_names_actual.txt employee_names_expected.txt
94a95
> ""Brígida Campos""
99,100c100
< ""Brogren, Stacy""
< ""Bross, Carroll""
---
> ""Brogren, Stacy"""
$ cat actual.setup.staples_data.employee_name_diff.txt
Diff of [C:\Users\ADMINI~1\AppData\Local\Temp\2\567f65b2273b789cc047911c579e9bcd01426baa8bb820f51bfcf03bk6vk957u\actual.setup.staples_data.employee_name.txt] and [C:\Users\Administrator\AppData\Local\Programs\Python\Python39\lib\site-packages\tdvt\exprtests/staples\expected.setup.staples_data.employee_name.txt].

Tuples - [Employee Name]
	 <<<< >>>> 
	actual: "Brij Raj Mathur"
	expected: "BrÌgida Campos"
	 <<<< >>>> 
	actual: "Brining, Julie"
	expected: "Brij Raj Mathur"
	 <<<< >>>> 
	actual: "Briskman, Hollace"
	expected: "Brining, Julie"
	 <<<< >>>> 
	actual: "Brittain, Geraldine"
	expected: "Briskman, Hollace"
(truncated)

The staples_data.product_name test fails with the same issue of wrong sort order as it also contains special chars like " and #.

See complete TDVT test results: tdvt_results_exasol_odbc.zip

I checked the list of capabilities but didn't find one that is related to sort order or encoding.

Please find the exasol_odbc connector at https://github.com/exasol/tableau-connector/tree/feature/22-switch-to-jdbc-driver/src/exasol_odbc and the TDVT configuration at https://github.com/exasol/tableau-connector/tree/feature/22-switch-to-jdbc-driver/tdvt_odbc

Please advise us how to find the root cause of the failing tests.

@jie-d
Copy link
Contributor

jie-d commented Oct 8, 2021

Internal tracking id: 1327205

@sxlin
Copy link
Contributor

sxlin commented Oct 20, 2021

@kaklakariada This could be related to collation setting of the columns. Here is the table definition in our test environment.

[COLUMN_NAME]	[SQL_TYPE]	[NULLABLE]	[DISTRIBUTION_KEY]	[PARTITION_KEY]
Item Count	DECIMAL(18,0)	FALSE	FALSE	FALSE
Ship Priority	CHAR(14) UTF8	FALSE	FALSE	FALSE
Order Priority	CHAR(15) UTF8	FALSE	FALSE	FALSE
Order Status	CHAR(13) UTF8	FALSE	FALSE	FALSE
Order Quantity	DOUBLE	FALSE	FALSE	FALSE
Sales Total	DOUBLE	FALSE	FALSE	FALSE
Discount	DOUBLE	FALSE	FALSE	FALSE
Tax Rate	DOUBLE	FALSE	FALSE	FALSE
Ship Mode	CHAR(25) UTF8	FALSE	FALSE	FALSE
Fill Time	DOUBLE	FALSE	FALSE	FALSE
Gross Profit	DOUBLE	FALSE	FALSE	FALSE
Price	DOUBLE	FALSE	FALSE	FALSE
Ship Handle Cost	DOUBLE	FALSE	FALSE	FALSE
Employee Name	CHAR(50) UTF8	FALSE	FALSE	FALSE
Employee Dept	CHAR(4) UTF8	FALSE	FALSE	FALSE
Manager Name	VARCHAR(255) UTF8	FALSE	FALSE	FALSE
Employee Yrs Exp	DOUBLE	FALSE	FALSE	FALSE
Employee Salary	DOUBLE	FALSE	FALSE	FALSE
Customer Name	CHAR(50) UTF8	FALSE	FALSE	FALSE
Customer State	CHAR(50) UTF8	FALSE	FALSE	FALSE
Call Center Region	CHAR(25) UTF8	FALSE	FALSE	FALSE
Customer Balance	DOUBLE	FALSE	FALSE	FALSE
Customer Segment	CHAR(25) UTF8	FALSE	FALSE	FALSE
Prod Type1	CHAR(50) UTF8	FALSE	FALSE	FALSE
Prod Type2	CHAR(50) UTF8	FALSE	FALSE	FALSE
Prod Type3	CHAR(50) UTF8	FALSE	FALSE	FALSE
Prod Type4	CHAR(50) UTF8	FALSE	FALSE	FALSE
Product Name	CHAR(100) UTF8	FALSE	FALSE	FALSE
Product Container	CHAR(25) UTF8	FALSE	FALSE	FALSE
Ship Promo	CHAR(25) UTF8	FALSE	FALSE	FALSE
Supplier Name	CHAR(25) UTF8	FALSE	FALSE	FALSE
Supplier Balance	DOUBLE	FALSE	FALSE	FALSE
Supplier Region	CHAR(25) UTF8	FALSE	FALSE	FALSE
Supplier State	CHAR(50) UTF8	FALSE	FALSE	FALSE
Order ID	CHAR(10) UTF8	FALSE	FALSE	FALSE
Order Year	DECIMAL(18,0)	FALSE	FALSE	FALSE
Order Month	DECIMAL(18,0)	FALSE	FALSE	FALSE
Order Day	DECIMAL(18,0)	FALSE	FALSE	FALSE
Order Date	TIMESTAMP	FALSE	FALSE	FALSE
Order Quarter	CHAR(2) UTF8	FALSE	FALSE	FALSE
Product Base Margin	DOUBLE	FALSE	FALSE	FALSE
Product ID	CHAR(5) UTF8	FALSE	FALSE	FALSE
Receive Time	DOUBLE	FALSE	FALSE	FALSE
Received Date	TIMESTAMP	FALSE	FALSE	FALSE
Ship Date	TIMESTAMP	FALSE	FALSE	FALSE
Ship Charge	DOUBLE	FALSE	FALSE	FALSE
Total Cycle Time	DOUBLE	FALSE	FALSE	FALSE
Product In Stock	CHAR(3) UTF8	FALSE	FALSE	FALSE
PID	DECIMAL(18,0)	FALSE	FALSE	FALSE
Market Segment	CHAR(25) UTF8	FALSE	FALSE	FALSE

@kaklakariada
Copy link
Author

@sxlin Thank you for your reply! I checked the data type, and it is VARCHAR(50) UTF8 for column Employee Name:

describe TESTV1."Staples";
COLUMN_NAME          SQL_TYPE           NULLABLE  DISTRIBUTION_KEY  PARTITION_KEY  
-------------------  -----------------  --------  ----------------  -------------  
Item Count           DECIMAL(18,0)      FALSE     FALSE             FALSE          
Ship Priority        VARCHAR(14) UTF8   FALSE     FALSE             FALSE          
Order Priority       VARCHAR(15) UTF8   FALSE     FALSE             FALSE          
Order Status         VARCHAR(13) UTF8   FALSE     FALSE             FALSE          
Order Quantity       DOUBLE             FALSE     FALSE             FALSE          
Sales Total          DOUBLE             FALSE     FALSE             FALSE          
Discount             DOUBLE             FALSE     FALSE             FALSE          
Tax Rate             DOUBLE             FALSE     FALSE             FALSE          
Ship Mode            VARCHAR(25) UTF8   FALSE     FALSE             FALSE          
Fill Time            DOUBLE             FALSE     FALSE             FALSE          
Gross Profit         DOUBLE             FALSE     FALSE             FALSE          
Price                DECIMAL(18,4)      FALSE     FALSE             FALSE          
Ship Handle Cost     DECIMAL(18,4)      FALSE     FALSE             FALSE          
Employee Name        VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Employee Dept        VARCHAR(4) UTF8    FALSE     FALSE             FALSE          
Manager Name         VARCHAR(255) UTF8  FALSE     FALSE             FALSE          
Employee Yrs Exp     DOUBLE             FALSE     FALSE             FALSE          
Employee Salary      DECIMAL(18,4)      FALSE     FALSE             FALSE          
Customer Name        VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Customer State       VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Call Center Region   VARCHAR(25) UTF8   FALSE     FALSE             FALSE          
Customer Balance     DOUBLE             FALSE     FALSE             FALSE          
Customer Segment     VARCHAR(25) UTF8   FALSE     FALSE             FALSE          
Prod Type1           VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Prod Type2           VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Prod Type3           VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Prod Type4           VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Product Name         VARCHAR(100) UTF8  FALSE     FALSE             FALSE          
Product Container    VARCHAR(25) UTF8   FALSE     FALSE             FALSE          
Ship Promo           VARCHAR(25) UTF8   FALSE     FALSE             FALSE          
Supplier Name        VARCHAR(25) UTF8   FALSE     FALSE             FALSE          
Supplier Balance     DOUBLE             FALSE     FALSE             FALSE          
Supplier Region      VARCHAR(25) UTF8   FALSE     FALSE             FALSE          
Supplier State       VARCHAR(50) UTF8   FALSE     FALSE             FALSE          
Order ID             VARCHAR(10) UTF8   FALSE     FALSE             FALSE          
Order Year           DECIMAL(18,0)      FALSE     FALSE             FALSE          
Order Month          DECIMAL(18,0)      FALSE     FALSE             FALSE          
Order Day            DECIMAL(18,0)      FALSE     FALSE             FALSE          
Order Date           TIMESTAMP          FALSE     FALSE             FALSE          
Order Quarter        VARCHAR(2) UTF8    FALSE     FALSE             FALSE          
Product Base Margin  DOUBLE             FALSE     FALSE             FALSE          
Product ID           VARCHAR(5) UTF8    FALSE     FALSE             FALSE          
Receive Time         DOUBLE             FALSE     FALSE             FALSE          
Received Date        TIMESTAMP          FALSE     FALSE             FALSE          
Ship Date            TIMESTAMP          FALSE     FALSE             FALSE          
Ship Charge          DECIMAL(18,4)      FALSE     FALSE             FALSE          
Total Cycle Time     DOUBLE             FALSE     FALSE             FALSE          
Product In Stock     VARCHAR(3) UTF8    FALSE     FALSE             FALSE          
PID                  DECIMAL(18,0)      FALSE     FALSE             FALSE          
Market Segment       VARCHAR(25) UTF8   FALSE     FALSE             FALSE          

The *.tds file contained <collation flag='0' name='binary' /> for columns Employee Name and Product Name. After removing this line for both columns, the two failing tests succeeded.

Thank you very much for your help!

kaklakariada added a commit to exasol/tableau-connector that referenced this issue Oct 22, 2021
kaklakariada added a commit to exasol/tableau-connector that referenced this issue Oct 27, 2021
* Update name of ODBC driver

* Update ODBC manifest

* Add JDBC connector

* Add JDBC capabilities

* Describe installation of jdbc driver

* Adapt capabilities for jdbc driver

* Add fingerprint

* Configure tdvt

* Add data source tests

* Add results workbook

* Update data source

* Update test result workbook

* Configure string trim for jdbc

* Add test with bad password

* Skip string.split test: function not supported by Exasol

* Rename column zzz2 to zzz

* Remove trailing spaces when extracting parts of a date

* Remove duplicate space

* Add rationale for skipping the 'split' function tests

* Change test user name to be equal to odbc tests

* Skip test for time data type

TIME not supported by Exasol

* Rename tdvt folder to odbc

* Adapt hostname for jdbc tests

* Adapt odbc tests to run

* Unify test configuration for jdbc & odbc

* Fix odbc issues with spaces

Remove trimming capabilities that hide issue with trailing spaces in TO_CHAR()

* Configure max number of threads to 1

* Remvove argument for number of threads

This is now configured in the ini file

* Add comment about failing tests

* Ignore Eclipse config files

* Update instructions for building the dockerfile

* Update dependencies

* Improve formatting of readme

* Add HtmlUnit driver

* Use Exasol 7.1.1, improve error handling

* Improve error handling in shell script

* Cleanup

* Adapt configuration to Eclipse 2021-09

* Validate & package jdbc connector

* Improve pom setup

* Cleanup tests: remove unused ports

* Read properties file in Java code to avoid having to set environment
variables when running tests from the IDE.

* Upgrade testcontainers to fix uploading files to the container

user id 'xxx' is too big ( > 2097151 ) when using commons-compress 1.21
with large user ID

testcontainers/testcontainers-java#4384

* Adapt UI-Tests to new Connector name / Tableau version

* Run tests in headless mode

* Document ui-test setup

* Build JDBC url with fingerprint and certificate validation arguments

* Update fingerprint in connection files

* Add developer guide

* Move test instructions to developer guide

* Add connection dialog screenshot

* Configure Eclipse formatter to never join already wrapped lines

* Extract base class for JDBC & ODBC connector tests

* Add UI tests for JDBC connector

* Fix failing StaplesDataTest

See tableau/connector-plugin-sdk#871 for details.

* Add changelog, improve documentation

* #22 Implement review findings by @AnastasiiaSergienko

* Apply suggestions from code review

Co-authored-by: Anastasiia Sergienko <[email protected]>

* Implement review findings by @AnastasiiaSergienko

Co-authored-by: Anastasiia Sergienko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants