how to deal with varchar(max) columns in mssql #56

TheDataScientistNL · 2023-10-05T08:29:36Z

Hi, I am using polars==0.19.7, which now includes ODBC support through arrow-odbc-py (arrow-odbc==1.2.8).

When running the code, see example below, an error occurs from arrow-odbc.

SRNM = ''
PWD = ''
DBNAME = ''
HOST = ''
PORT = ''

CONN = f"Driver={{ODBC Driver 17 for SQL Server}};Server={HOST};Port={PORT};Database={DBNAME};Uid={USERNM};Pwd={PWD}"

df = pl.read_database(
    connection=CONN,
    query="SELECT varchar_max_col FROM [dbo].[tablname]",
)

with the error being:

arrow_odbc.error.Error: There is a problem with the SQL type of the column with name: varchar_max_col and index 0:
ODBC reported a size of '0' for the column. This might indicate that the driver cannot specify a sensible upper bound for the column. E.g. for cases like VARCHAR(max). Try casting the column into a type with a sensible upper bound. The type of the column causing this error is Varchar { length: 0 }.

I can easily resolve this by editing the query to

df = pl.read_database( connection=CONN, query="SELECT CAST(varchar_max_col AS VARCHAR(100)) AS varchar_max_col FROM [dbo].[tablname]", )
which then resolves the issue (or change the column type in the database, but that is not something you want to do or always can do).

However, as varchar(max) columns still occur frequently in databases, I was wondering if there could be native support in arrow-odbc for this? In other words, it catches varchar(max) columns and optimizes the query to return these columns without throwing an error.

I hope this is the right place to ask the question, because I am not sure if this is arrow-odbc related or ODBC driver related...

The text was updated successfully, but these errors were encountered:

pacman82 · 2023-10-05T16:15:38Z

Hello @TheDataScientistNL ,

the best way to deal with VARCHAR(max) ist to set the max_text_size parameter. See the documentation here: https://arrow-odbc.readthedocs.io/en/latest/arrow_odbc.html#arrow_odbc.read_arrow_batches_from_odbc

You are not using the read_arrow_batches_from_odbc directly but via polars, which I think was added yesterday. Please ask the maintainers of polars how to forward this parameters or use arrow-odbc directly.

Best, Markus

pacman82 · 2023-10-05T16:20:26Z

I hope this is the right place to ask the question, because I am not sure if this is arrow-odbc related or ODBC driver related...

Neither it is ODBC standard related. It is an inherent limitation in the API. Avoid VARCHAR(max), TEXT or similar unbounded types in schema declarations, if you want fast bulk fetches. I take back what I said earlier. Best way to deal with this is to fix the schema, if possible.

alexander-beedie · 2023-10-06T07:25:43Z

And I was so hoping to avoid a mystery-meat **kwargs pass-through for all the different connection flavours we now support 🤣 I'll think about the cleanest thing we can expose.

pacman82 · 2023-10-06T15:52:58Z

Just typing on my phone right now, so I will keep it short. I can sympathise with that. I wouldn't recommend a passthrough at all.

TheDataScientistNL mentioned this issue Oct 5, 2023

feat(python): add ODBC connection string support to read_database pola-rs/polars#11448

Merged

jonashaag mentioned this issue Oct 9, 2023

How to deal with Snowflake string length reporting blue-yonder/turbodbc#395

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to deal with varchar(max) columns in mssql #56

how to deal with varchar(max) columns in mssql #56

TheDataScientistNL commented Oct 5, 2023 •

edited

Loading

pacman82 commented Oct 5, 2023

pacman82 commented Oct 5, 2023

alexander-beedie commented Oct 6, 2023 •

edited

Loading

pacman82 commented Oct 6, 2023

how to deal with varchar(max) columns in mssql #56

how to deal with varchar(max) columns in mssql #56

Comments

TheDataScientistNL commented Oct 5, 2023 • edited Loading

pacman82 commented Oct 5, 2023

pacman82 commented Oct 5, 2023

alexander-beedie commented Oct 6, 2023 • edited Loading

pacman82 commented Oct 6, 2023

TheDataScientistNL commented Oct 5, 2023 •

edited

Loading

alexander-beedie commented Oct 6, 2023 •

edited

Loading