-
Hi,
it works but is far from ideal. I tried to use arrow-odbc but so far I haven't figured out if and how I can decode Latin-1 to UTF-8 while reading tinto a arrow batch.
Is it somehow possible to give arrow-odbc a conversion function/closure for such cases? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @sportfloh , ODBC offers to ways to fetch string data. Narrow and Wide. Wide strings are always guaranteed to be UTF-16, but Narrow Strings depend on the (Clients) system encoding according to ODBC standard, or whatever your driver thinks is appropriate. So you could try out a bunch of things.
Best, Markus |
Beta Was this translation helpful? Give feedback.
-
Thanks for your quick response! let mut cursor = db2_connection
.execute(sql_statement, ())?
.expect("SELECT statement must produce a cursor");
let mut schema = arrow_schema_from(&mut cursor)?;
schema.fields = schema
.fields
.iter()
.map(|f| match *f.data_type() {
// DataType::Utf8 => (**f).clone().with_data_type(DataType::Binary),
DataType::Decimal128(1, 0) => (**f).clone().with_data_type(DataType::Boolean),
DataType::Decimal128(_, 0) => (**f).clone().with_data_type(DataType::Int64),
DataType::Decimal128(_, s) if 0 < s => (**f).clone().with_data_type(DataType::Float64),
_ => (**f).clone(),
})
.collect();
// dbg!(&schema);
let arrow_record_batches = OdbcReaderBuilder::new()
.with_schema(schema.into())
.build(cursor)?;
fn record_batch_to_dataframe(batch: &RecordBatch) -> Result<DataFrame, PolarsError> {
let schema = batch.schema();
let mut columns = Vec::with_capacity(batch.num_columns());
for (i, column) in batch.columns().iter().enumerate() {
let arrow = Box::<dyn polars_arrow::array::Array>::from(&**column);
columns.push(Series::from_arrow(
schema.fields().get(i).unwrap().name(),
arrow,
)?);
}
Ok(DataFrame::from_iter(columns))
}
for batch in arrow_record_batches {
dbg!(record_batch_to_dataframe(&batch?)?);
} I found the conversion from RecordBatch to polars::Dataframe here: https://stackoverflow.com/questions/78084066/arrow-recordbatch-as-polars-dataframe Thanks and cheers! |
Beta Was this translation helpful? Give feedback.
Thanks for your quick response!
Turns out there is some broken whatever encoding in the table -.-
I'm on macOS and my locale is "en_US.UTF-8". So should be good.
My current solution looks like this: