You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Inferring the schema does not work with a compressed file, in some cases.
To Reproduce
Create a file like this one, and gzip it:
❯ cat file1.csv
Region,Units
United States,10
Canada,20
United States,10
Canada,20
United States,10
Canada,20
Canada,20
❯ gzip file1.csv
I couldn't duplicate this error with a gzipped file of exact repeating rows. It requires some variation in the rows (or there's something else that triggers it).
Then attempt to make a table from it:
❯ create external table file_csv stored as csv with header row compression type gzip location './file1.csv.gz';
ArrowError(CsvError("corrupt deflate stream"))
Expected behavior
It shouldn't throw an error and should be able to create the external table.
Additional context
It works fine with the same file if you specify the table schema:
❯ create external table file_csv(a string,b int) stored as csv with header row compression type gzip location './file1.c
sv.gz';
0 rows in set. Query took 0.000 seconds.
❯ select * from file_csv;
+---------------+----+
| a | b |
+---------------+----+
| United States | 10 |
| Canada | 20 |
| United States | 10 |
| Canada | 20 |
| United States | 10 |
| Canada | 20 |
| Canada | 20 |
+---------------+----+
7 rows in set. Query took 0.003 seconds.
The text was updated successfully, but these errors were encountered:
It looks like the schema inference logic is applying the newline delimiting chunking before applying the decompression, it should be a case of reversing the order of these two. In particular using FileCompressionType::convert_stream instead of FileCompressionType::convert_read
Describe the bug
Inferring the schema does not work with a compressed file, in some cases.
To Reproduce
Create a file like this one, and gzip it:
I couldn't duplicate this error with a gzipped file of exact repeating rows. It requires some variation in the rows (or there's something else that triggers it).
Then attempt to make a table from it:
Expected behavior
It shouldn't throw an error and should be able to create the external table.
Additional context
It works fine with the same file if you specify the table schema:
The text was updated successfully, but these errors were encountered: