Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to arrow/parquet 53.0.0, tonic, prost, object_store, pyo3 #12032

Merged
merged 50 commits into from
Sep 5, 2024

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 16, 2024

Draft as this is waiting on the actual arrow 53.0.0 release: apache/arrow-rs#6016

Which issue does this PR close?

N/A

Rationale for this change

We need to update the DataFusion ecosystem

What changes are included in this PR?

Are these changes tested?

Yes by CI

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions core Core DataFusion crate substrait common Related to common crate proto Related to proto crate functions labels Aug 16, 2024
Some(1),
Some(10),
None,
Some(0),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to apache/arrow-rs#6216 (where null counts is now Option)

@alamb alamb changed the title Update to arrow/parquet `53.0.0 Update to arrow/parquet 53.0.0, tonic, prost, etc Aug 16, 2024
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 16, 2024
async-trait = "0.1.73"
aws-config = "1.5.5"
# begin pin aws-sdk crates otherwise CI MSRV check fails
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example failure https://github.com/apache/datafusion/actions/runs/10690110122/job/29633769353?pr=12032

The same thing actually happens on main if you do cargo update in the datafusion-cli directory, just no one had hit that yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an unfortunate choice by aws.

@alamb
Copy link
Contributor Author

alamb commented Sep 4, 2024

This PR is now ready for review 🎉 (finally, 53.0.0 is released)

@@ -401,8 +401,7 @@ fn _regexp_replace_static_pattern_replace<T: OffsetSizeTrait>(
DataType::Utf8View => {
let string_view_array = as_string_view_array(&args[0])?;

let mut builder = StringViewBuilder::with_capacity(string_view_array.len())
.with_block_size(1024 * 1024 * 2);
let mut builder = StringViewBuilder::with_capacity(string_view_array.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that mean the block size is not used in the builder for this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we don't need it after this pr is merged: apache/arrow-rs#6136
I ran a local benchmark and show no perf diff

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @alamb

@alamb
Copy link
Contributor Author

alamb commented Sep 5, 2024

Thanks for the review @comphead and all the help @XiangpengHao -- let's get this merged!

@alamb alamb merged commit 6034be4 into apache:main Sep 5, 2024
26 checks passed
@alamb alamb deleted the alamb/update_arrow_53 branch September 5, 2024 11:08
wiedld pushed a commit to influxdata/arrow-datafusion that referenced this pull request Oct 16, 2024
…`, `pyo3` (apache#12032)

* Update prost, prost-derive, pbjson

* udpate more

* Update datafusion/substrait/Cargo.toml

Co-authored-by: tison <[email protected]>

* Update vendored code

* revert upgrade in datafusion-examples until arrow-flight is updated

* Pin to pre-release arrow-rs

* update pyo3

* Update to use new arrow apis

* update for new api

* Update tonic in examples

* update prost

* update datafusion-cli/cargo

* update test output

* update

* updates

* updates

* update math

* update more

* fix scalar tests

* Port statistics to use new API

* factor into a function

* update generated files

* Update test

* add new test

* update tests

* tapelo format

* Update other tests

* Update datafusion pin

* Update for API change

* Update to arrow 53.0.0 sha

* Update cli deps

* update cargo.lock

* Update expected output

* Remove patch

* update datafusion-cli cargo

* Pin some aws sdks whose update caused CI failures

---------

Co-authored-by: tison <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate functions physical-expr Physical Expressions proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants