Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Logical Types Branch #14241

Merged
merged 770 commits into from
Jan 23, 2025
Merged

Conversation

tobixdev
Copy link
Contributor

Which issue does this PR close?

Updates the logical-types branch to main for #12622.
In the last PR (#14202) there was a problem which caused unintended changes when diffing logical-types with main.

In this diff between the upstream main and my logical-types branch you can only see the intended changes. Hopefully it works this time so we can start with addressing the other tasks in #12622 .

Rationale for this change

Updates the logical-types branch to main for #12622.

What changes are included in this PR?

Apply Scalar type to new code and resolve issues between the logical-types branch and main.

Are these changes tested?

Are there any user-facing changes?

cc @jayzhan211

Eason0729 and others added 30 commits December 11, 2024 07:21
* fix: Fix parse_sql_expr not handling alias

* cargo fmt

* fix parse_sql_expr example(remove alias)

* add testing

* add SUM udaf to TestContextProvider and modify test_sql_to_expr_with_alias for function

* revert change on example `parse_sql_expr`
apache#13730)

Debug trait is useful for understanding what something is and how it's
configured, especially if the implementation is behind dyn trait.
…13660)

* add `unnest_as_table_factor` and `UnnestRelationBuilder`

* unparse unnest as table factor

* fix typo

* add tests for the default configs

* add a static const for unnest_placeholder

* fix tests

* fix tests
…apache#13727)

* Update apache-avro requirement from 0.16 to 0.17

---
updated-dependencies:
- dependency-name: apache-avro
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* Fix compatibility changes schema handling apache-avro 0.17

- Handle ArraySchema struct
- Handle MapSchema struct
- Map BigDecimal => LargeBinary
- Map TimestampNanos => Timestamp(TimeUnit::Nanosecond, None)
- Map LocalTimestampNanos => todo!()
- Add Default to FixedSchema test

* Update Cargo.lock file for apache-avro 0.17

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marc Droogh <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
* Minor: Add doc example to RecordBatchStreamAdapter

* Update datafusion/physical-plan/src/stream.rs

Co-authored-by: Berkay Şahin <[email protected]>

---------

Co-authored-by: Berkay Şahin <[email protected]>
…13581)

* Implement GroupsAccumulator for corr(x,y)

* feedbacks

* fix CI MSRV

* review

* avoid collect in accumulation

* add back cast
* fix union serialisation order in proto

* clippy

* address comments
…apache#13733)

* Minor: make unsupported `nanosecond` part a real (not internal) error

* fmt

* Improve wording to refer to date part
…nes (apache#13732)

* Add tests for date_part on columns + timestamps with / without timezones

* Add tests from apache#13372

* remove trailing whitespace
* Optimize performance of initcap (~2x faster)

Signed-off-by: Tai Le Manh <[email protected]>

* format

---------

Signed-off-by: Tai Le Manh <[email protected]>
Before the change, the request to use PostgreSQL was simply ignored when
`--complete` flag was present.
…pache#13739)

* doc-gen: migrate window functions documentation

Signed-off-by: zjregee <[email protected]>

* fix: update Cargo.lock

---------

Signed-off-by: zjregee <[email protected]>
…pache#13751)

* Refactor JoinLeftData structure by removing unused memory reservation field in hash join implementation

* Add Debug and Clone derives for HashJoinStreamState and ProcessProbeBatchState enums

This commit enhances the HashJoinStreamState and ProcessProbeBatchState structures by implementing the Debug and Clone traits, allowing for easier debugging and cloning of these state representations in the hash join implementation.
* Add big decimal formatting test cases with potential trailing zeros

* Rename and simplify decimal rendering functions

- add `decimal` to function name
- drop `precision` parameter as it is not supposed to affect the result

* Update to bigdecimal 0.4.7

Utilize new `to_plain_string` function
* CI: Warn on unused crates

* CI: Warn on unused crates

* CI: Warn on unused crates

* CI: Warn on unused crates

* CI: Clean up dependencies

* CI: Clean up dependencies
* plan implicit lateral if table factor is UNNEST

* check for outer references in `create_relation_subquery`

* add sqllogictest

* fix lateral constant test to not expect a subquery node

* replace sqllogictest in favor of logical plan test

* update lateral join sqllogictests

* add sqllogictests

* fix logical plan test
* Minor: improve the Deprecation / API health policy

* prettier

* Update docs/source/library-user-guide/api-health.md

Co-authored-by: Jonah Gao <[email protected]>

* Add version guidance and make more copy/paste friendly

* prettier

* better

* rename to guidelines

---------

Co-authored-by: Jonah Gao <[email protected]>
* fix: specify roottype in fieldreference

Signed-off-by: MBWhite <[email protected]>

* Fix formatting

Signed-off-by: MBWhite <[email protected]>

* review suggestion

Signed-off-by: MBWhite <[email protected]>

---------

Signed-off-by: MBWhite <[email protected]>
…nction signature (apache#13372)

* add type sig class

Signed-off-by: jayzhan211 <[email protected]>

* timestamp

Signed-off-by: jayzhan211 <[email protected]>

* date part

Signed-off-by: jayzhan211 <[email protected]>

* fmt

Signed-off-by: jayzhan211 <[email protected]>

* taplo format

Signed-off-by: jayzhan211 <[email protected]>

* tpch test

Signed-off-by: jayzhan211 <[email protected]>

* msrc issue

Signed-off-by: jayzhan211 <[email protected]>

* msrc issue

Signed-off-by: jayzhan211 <[email protected]>

* explicit hash

Signed-off-by: jayzhan211 <[email protected]>

* Enhance type coercion and function signatures

- Added logic to prevent unnecessary casting of string types in `native.rs`.
- Introduced `Comparable` variant in `TypeSignature` to define coercion rules for comparisons.
- Updated imports in `functions.rs` and `signature.rs` for better organization.
- Modified `date_part.rs` to improve handling of timestamp extraction and fixed query tests in `expr.slt`.
- Added `datafusion-macros` dependency in `Cargo.toml` and `Cargo.lock`.

These changes improve type handling and ensure more accurate function behavior in SQL expressions.

* fix comment

Signed-off-by: Jay Zhan <[email protected]>

* fix signature

Signed-off-by: Jay Zhan <[email protected]>

* fix test

Signed-off-by: Jay Zhan <[email protected]>

* Enhance type coercion for timestamps to allow implicit casting from strings. Update SQL logic tests to reflect changes in timestamp handling, including expected outputs for queries involving nanoseconds and seconds.

* Refactor type coercion logic for timestamps to improve readability and maintainability. Update the `TypeSignatureClass` documentation to clarify its purpose in function signatures, particularly regarding coercible types. This change enhances the handling of implicit casting from strings to timestamps.

* Fix SQL logic tests to correct query error handling for timestamp functions. Updated expected outputs for `date_part` and `extract` functions to reflect proper behavior with nanoseconds and seconds. This change improves the accuracy of test cases in the `expr.slt` file.

* Enhance timestamp handling in TypeSignature to support timezone specification. Updated the logic to include an additional DataType for timestamps with a timezone wildcard, improving flexibility in timestamp operations.

* Refactor date_part function: remove redundant imports and add missing not_impl_err import for better error handling

---------

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: Jay Zhan <[email protected]>
* Minor: Add some more blog posts to the readings page

* prettier

* prettier

* Update docs/source/user-guide/concepts-readings-events.md

---------

Co-authored-by: Oleks V <[email protected]>
)

Fixing `GroupsAccumulator` trait name in its docs
* Improve deprecation guidelines more

* prettier
…ingArrayBuilder` (apache#13758)

* fix: add `null_buffer` check for `LargeStringArray`

Add a safety check to ensure that the alignment of buffers cannot be
overflowed. This introduces a panic if they are not aligned through a
runtime assertion.

* fix: remove value_buffer assertion

These buffers can be misaligned and it is not problematic, it is the
`null_buffer` which we care about being of the same length.

* feat: add `null_buffer` check to `StringArray`

This is in a similar vein to `LargeStringArray`, as the code is the
same, except for `i32`'s instead of `i64`.

* feat: use `row_count` var to avoid drift
* fix: restore memory reservation in JoinLeftData for accurate memory accounting in HashJoin

This commit reintroduces the `_reservation` field in the `JoinLeftData` structure to ensure proper tracking of memory resources during join operations. The absence of this field could lead to inconsistent memory usage reporting and potential out-of-memory issues as upstream operators increase their memory consumption.

* fmt

Signed-off-by: Jay Zhan <[email protected]>

---------

Signed-off-by: Jay Zhan <[email protected]>
* Update documentation guidelines for contribution content

* Apply suggestions from code review

Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Oleks V <[email protected]>

* clarify discussions and remove requirements note

* prettier

* Update docs/source/contributor-guide/index.md

Co-authored-by: Piotr Findeisen <[email protected]>

---------

Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Oleks V <[email protected]>
* Add Round trip tests for Array <--> ScalarValue

* String dictionary test

* remove unecessary value

* Improve comments
cj-zhukov and others added 14 commits January 18, 2025 12:35
apache#14168)

* Add a hint about expected extension in error message in register_csv, register_parquet, register_json, register_avro (apache#14144)

* Add tests for error

* fix test

* fmt

* Fix issues causing GitHub checks to fail

* revert datafusion-testing change

---------

Co-authored-by: Sergey Zhukov <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
…#14142)

* External memory limit validation for sort

* add bug tracker

* cleanup

* Update submodule

* reviews

* fix CI

* move feature to module level
…nction (apache#14183)

Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
…tion function (apache#14181)

Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
* Added job board as a separate header in the documentation

* Update docs/source/contributor-guide/communication.md

Co-authored-by: Andrew Lamb <[email protected]>

* Update docs/source/contributor-guide/communication.md

Co-authored-by: Andrew Lamb <[email protected]>

* prettier

---------

Co-authored-by: Andrew Lamb <[email protected]>
…T FROM` (apache#14187)

* Mapped the Spaceship operator with IsNotDistinctFrom

* Added tests for Spaceship Operator <=>

* Added sanity test for Spaceship Operator <=>
* feat: Use `SchemaRef` in `JoinFilter`

* Update datafusion/core/src/physical_optimizer/projection_pushdown.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/physical-plan/src/joins/join_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/physical-plan/src/joins/join_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* Update datafusion/physical-plan/src/joins/join_filter.rs

Co-authored-by: Andrew Lamb <[email protected]>

* fix

---------

Co-authored-by: Andrew Lamb <[email protected]>
…ns-nested functions (apache#14201)

Co-authored-by: Cheng-Yuan-Lai <a186235@g,ail.com>
@github-actions github-actions bot added documentation Improvements or additions to documentation sql SQL Planner development-process Related to development process of DataFusion logical-expr Logical plan and expressions physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait catalog Related to the catalog crate common Related to common crate execution Related to the execution crate proto Related to proto crate functions labels Jan 22, 2025
@jayzhan211 jayzhan211 merged commit 25f02a7 into apache:logical-types Jan 23, 2025
27 checks passed
@jayzhan211
Copy link
Contributor

jayzhan211 commented Jan 23, 2025

I guess there is something wrong with Github, so the file changed displayed in Compare shows additional unexpected changes. I fork logical-types to another branch logical-types-v2 and the comparison in PR looks correct 🤔

Anyway, I think we can keep working on other tasks on branch logical-types

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate common Related to common crate core Core DataFusion crate development-process Related to development process of DataFusion documentation Improvements or additions to documentation execution Related to the execution crate functions logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Physical Expressions proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.