Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WriteBuilder::with_input_execution_plan does not apply the schema to the log's metadata fields #2105

Closed
universalmind303 opened this issue Jan 22, 2024 · 0 comments · Fixed by #2106
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@universalmind303
Copy link
Contributor

Environment

Delta-rs version:
0.17.0 (61ca275)

Binding:

Environment:

  • Cloud provider:
  • OS:
  • Other:

Bug

WriteBuilder::with_input_execution_plan does not properly update the schema the same way that with_input_batches does.

When using with_input_execution_plan, the schema is completely dropped & the table is unusable.

What you expected to happen:

The schema is preserved in the same way that using with_input_batches does

How to reproduce it:

[package]
name = "delta-mre"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
datafusion = "34.0.0"
tempfile = "3.9.0"
tokio = { version = "1.35.1", features = ["full"] }
url = "2.5.0"

[dependencies.deltalake]
git = "https://github.com/delta-io/delta-rs.git"
rev = "61ca275b576d8a8577fc3520aad28d3e365d42b1"
features = ["datafusion", "arrow"]
use std::sync::Arc;

use datafusion::{
    arrow::datatypes::{DataType, Field, Schema},
    datasource::TableProvider,
    execution::context::SessionContext,
};
use deltalake::{
    kernel::{DataType as DeltaDataType, PrimitiveType},
    logstore::default_logstore,
    operations::{create::CreateBuilder, write::WriteBuilder},
    protocol::SaveMode,
    storage::object_store::local::LocalFileSystem,
};
use url::Url;
#[tokio::main]
async fn main() {
    let path = tempfile::tempdir().unwrap();
    let path = path.into_path();

    let file_store = LocalFileSystem::new_with_prefix(path.clone()).unwrap();
    let log_store = default_logstore(
        Arc::new(file_store),
        &Url::from_file_path(path.clone()).unwrap(),
        &Default::default(),
    );

    let tbl = CreateBuilder::new()
        .with_log_store(log_store.clone())
        .with_save_mode(SaveMode::Overwrite)
        .with_table_name("test")
        .with_column(
            "id",
            DeltaDataType::Primitive(PrimitiveType::Integer),
            true,
            None,
        );
    let tbl = tbl.await.unwrap();
    let ctx = SessionContext::new();
    let plan = ctx
        .sql("SELECT 1 as id")
        .await
        .unwrap()
        .create_physical_plan()
        .await
        .unwrap();
    let write_builder = WriteBuilder::new(log_store, tbl.state);
    let _ = write_builder
        .with_input_execution_plan(plan)
        .with_save_mode(SaveMode::Overwrite)
        .await
        .unwrap();

    let table = deltalake::open_table(path.to_str().unwrap()).await.unwrap();
    let prov: Arc<dyn TableProvider> = Arc::new(table);
    ctx.register_table("test", prov).unwrap();
    let mut batches = ctx
        .sql("SELECT * FROM test")
        .await
        .unwrap()
        .collect()
        .await
        .unwrap();
    let batch = batches.pop().unwrap();

    let expected_schema = Schema::new(vec![Field::new("id", DataType::Int32, true)]);
    assert_eq!(batch.schema().as_ref(), &expected_schema);
}

More details:

If i collect the batches manually instead of using with_input_execution_plan, everything works as expected

    let batches = ctx
        .sql("SELECT 1 as id")
        .await
        .unwrap()
        .collect()
        .await
        .unwrap();
    let write_builder = WriteBuilder::new(log_store, tbl.state);
    let _ = write_builder
        .with_input_batches(plan)
        .with_save_mode(SaveMode::Overwrite)
        .await
        .unwrap();
@universalmind303 universalmind303 added the bug Something isn't working label Jan 22, 2024
@ion-elgreco ion-elgreco added the binding/rust Issues for the Rust crate label Jan 25, 2024
rtyler pushed a commit that referenced this issue Jan 25, 2024
# Description
The schema when using `with_input_execution_plan` wasn't being applied.

# Related Issue(s)
closes #2105

# Documentation

<!---
Share links to useful documentation
--->
RobinLin666 pushed a commit to RobinLin666/delta-rs that referenced this issue Feb 2, 2024
# Description
The schema when using `with_input_execution_plan` wasn't being applied.

# Related Issue(s)
closes delta-io#2105

# Documentation

<!---
Share links to useful documentation
--->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants