Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: querying on date partitions (fixes #1445) #1594

Merged
merged 17 commits into from
Sep 5, 2023

Conversation

watfordkcf
Copy link
Contributor

Description

Fixes queries on date partitioned data. Trying to get the previous PR up to date with main.

Related Issue(s)

Documentation

@watfordkcf
Copy link
Contributor Author

watfordkcf commented Aug 18, 2023

Clearly the TestCase setup for test_files_scanned is causing the issue in the integration test, but I'm just too far removed to figure out what exactly is up with the test (new to Rust, bizarre test setup, etc).

Future me to past me, if you're using VS Code to debug Rust, the following is a nice snippet for settings.json:

{
    "rust-analyzer.cargo.features": [
        "datafusion",
        "integration_test"
    ]
}

@watfordkcf
Copy link
Contributor Author

Some printf debugging and it looks like it isn't what I thought it was:

[Unwrapped] Test Column: utf8 value: Utf8("1")
[Unwrapped] Test Column: int64 value: Int64(1)
[Unwrapped] Test Column: int32 value: Int32(1)
[Unwrapped] Test Column: int16 value: Int16(1)
[Unwrapped] Test Column: int8 value: Int8(1)
[Unwrapped] Test Column: float64 value: Float64(1)
[Unwrapped] Test Column: float32 value: Float32(1)
[Unwrapped] Test Column: timestamp value: TimestampMicrosecond(1000000, None)
[Wrapped] Test Column: utf8 value: Dictionary(UInt16, Utf8("1"))
[Wrapped] Test Column: int64 value: Dictionary(UInt16, Int64(1))
thread 'local::test_files_scanned' panicked at 'called `Result::unwrap()` on an `Err` value: Internal("The type of Int64 Eq Dictionary(UInt16, Int64) of binary physical should be same")', rust/src/delta_datafusion.rs:844:70
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
   4: deltalake::delta_datafusion::logical_expr_to_physical_expr
             at ./src/delta_datafusion.rs:844:5
   5: deltalake::delta_datafusion::<impl datafusion::datasource::provider::TableProvider for deltalake::delta::DeltaTable>::scan::{{closure}}::{{closure}}
             at ./src/delta_datafusion.rs:463:25
   6: core::option::Option<T>::map
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/option.rs:1099:29
   7: deltalake::delta_datafusion::<impl datafusion::datasource::provider::TableProvider for deltalake::delta::DeltaTable>::scan::{{closure}}
             at ./src/delta_datafusion.rs:462:27
   8: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/future/future.rs:125:9
   9: integration_datafusion::local::get_scan_metrics::{{closure}}
             at ./tests/integration_datafusion.rs:356:52
  10: integration_datafusion::local::test_files_scanned::{{closure}}
             at ./tests/integration_datafusion.rs:678:65
  11: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/future/future.rs:125:9
  12: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/future/future.rs:125:9
  13: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:541:57
  14: tokio::runtime::coop::with_budget
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/coop.rs:107:5
  15: tokio::runtime::coop::budget
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/coop.rs:73:5
  16: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:541:25
  17: tokio::runtime::scheduler::current_thread::Context::enter
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:350:19
  18: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:540:36
  19: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:615:57
  20: tokio::macros::scoped_tls::ScopedKey<T>::set
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/macros/scoped_tls.rs:61:9
  21: tokio::runtime::scheduler::current_thread::CoreGuard::enter
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:615:27
  22: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:530:19
  23: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/scheduler/current_thread.rs:154:24
  24: tokio::runtime::runtime::Runtime::block_on
             at /Users/watford/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.28.2/src/runtime/runtime.rs:302:47
  25: integration_datafusion::local::test_files_scanned
             at ./tests/integration_datafusion.rs:756:9
  26: integration_datafusion::local::test_files_scanned::{{closure}}
             at ./tests/integration_datafusion.rs:512:38
  27: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
  28: core::ops::function::FnOnce::call_once
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test local::test_files_scanned ... FAILED

@watfordkcf
Copy link
Contributor Author

watfordkcf commented Aug 21, 2023

So if I just drop the "wrapped" test block things are fine...I just don't understand why. Previously we only left booleans unwrapped, and now its everything but large types. Is there utility in testing the wrapped logic for most of these columns? I think the only ones left to test would be utf8 and binary.

EDIT: real question is "what on earth does it mean to wrap a scalar in a dictionary for a partition"?

Only thing left is utf8, separate it into its own test block like boolean?
@watfordkcf
Copy link
Contributor Author

My last day at KCF is 2023-09-13, I've subscribed to this PR from my personal GitHub and will attempt to follow-up there as needed.

@rtyler rtyler enabled auto-merge (rebase) August 31, 2023 19:37
auto-merge was automatically disabled August 31, 2023 20:05

Rebase failed

@wjones127 wjones127 merged commit 7cefe94 into delta-io:main Sep 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate rust
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants