-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type - while merge data in to delta table #2423
Comments
Spark uses deprecated parquet dtypes, you should set this to resolve it: SparkSession.config("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS") |
@ion-elgreco thanks, |
@murughu you should rewrite the table with that spark config setting |
@ion-elgreco yes I did the same. i create new delta table with this config and then try to merge that newly created table using delta-rs |
@murughu
|
@ion-elgreco, yes tried, still i am getting same error
|
@murughu then I'm not sure what the issue is. |
just tested the code with @ion-elgreco suggestion pyspark code
also you can check the data type of your parquet column i use parquet-tools for it . |
@sherlockbeard thanks for confirming! Will close it then :) |
@ion-elgreco @sherlockbeard Thanks still I am facing the same issue, blow are my observation isAdjustedToUTC=true but delta-rs is expecting isAdjustedToUTC=false to make timeisAdjustedToUTC=false, I converted the timestamp datetype to TIMESTAMP_NTZ in my spark code. but this time isAdjustedToUTC=false but converted_type(legacy) : None. when I try to merge this table using delta-rs getting below error: ( i think while creating DeltaTable object got this error) dt = DeltaTable(delta_table_path,storage_options =storage_options) |
weird I tried with isAdjustedToUTC=true and delta-rs was able to merge it . with df1 = df.withColumn("datalake_updated_timestampUtc", current_timestamp().cast('TIMESTAMP_NTZ')) delta-rs 0.16.4 |
@sherlockbeard @ion-elgreco thanks, |
Environment
Delta-rs version:
v0.16.0 up to main
Binding:
Environment:
Bug
when merge data into existing delta table(which created using spark) getting this error " Generic DeltaTable error: External error: Arrow error: Invalid argument error: arguments need to have the same data type"
What happened:
What you expected to happen:
should able to merge the data into existing delta-table using delta-rs lib.
How to reproduce it:
Spark Code :
delta-rs/pyarrow code
More Detail:
getting this error when we add timestamp column.
The text was updated successfully, but these errors were encountered: