Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Append to ipc fails creates file with "InvalidOffset" error. #13761

Closed
2 tasks done
Jonarod opened this issue Jan 16, 2024 · 5 comments
Closed
2 tasks done

Append to ipc fails creates file with "InvalidOffset" error. #13761

Jonarod opened this issue Jan 16, 2024 · 5 comments
Labels
bug Something isn't working rust Related to Rust Polars

Comments

@Jonarod
Copy link

Jonarod commented Jan 16, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

use polars_arrow::io::ipc;
use polars_arrow::io::ipc::write::{WriteOptions, Compression};
use polars_arrow::chunk::Chunk;
use polars_arrow::array::PrimitiveArray;
use polars_arrow::datatypes::{Field, ArrowDataType, ArrowSchema};
use polars::{
  prelude::{
    LazyFrame, 
    ScanArgsIpc, 
  }
};


fn main() {

  // =====================================================
  // Create new ipc file
  // =====================================================
  let filepath = "./test_polars_arrow.ipc";
  let fields = vec![
    Field::new(String::from("col"), ArrowDataType::Float64, false),
  ];
  let schema = Arc::new(ArrowSchema::from(fields));

  let file = File::create(&filepath).unwrap();
  let options = WriteOptions { 
    compression: None
  };
  let mut writer = ipc::write::FileWriter::new(&file, schema, None, options);

  let col = PrimitiveArray::from_vec(vec![1.0]);

  writer.start().unwrap();
  writer.write(&Chunk::new(vec![Box::new(col)]), None).unwrap();
  writer.finish().unwrap();

  // =====================================================
  // Prove it is a valid file polars can work with
  // =====================================================
  let df = LazyFrame::scan_ipc(&filepath, ScanArgsIpc::default()).unwrap();
  println!("{:#?}", df.collect().unwrap()); // should print 1 df with 1 row and 1 col
  
  // =====================================================
  // Now try to append to it
  // =====================================================
  let metadata = ipc::read::read_file_metadata(&mut File::open(filepath).unwrap()).unwrap().clone();
  let file_append_mode = File::options().append(true).open("./test_polars_arrow.ipc").unwrap();
  let options = WriteOptions { 
    compression: None
  };
  let mut writer = ipc::write::FileWriter::try_from_file(file_append_mode, metadata, options).unwrap();

  let col = PrimitiveArray::from_vec(vec![1.0]);

  writer.write(&Chunk::new(vec![Box::new(col)]), None).unwrap();
  writer.finish().unwrap();
  
  // =====================================================
  // Try to read the file again
  // =====================================================
  let df = LazyFrame::scan_ipc(&filepath, ScanArgsIpc::default()).unwrap();
  println!("{:#?}", df.collect().unwrap()); // should print 1 df with 1 row and 1 col
  // panics with:
  // called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("out-of-spec InvalidFlatbufferMessage(Error { source_location: ErrorLocation { type_: \"[MessageRef]\", method: \"read_as_root\", byte_offset: 0 }, error_kind: InvalidOffset })"))

}

Log output

No response

Issue description

When trying to append ipc values to an existing ipc file, the resulting ipc file is incorrect.

Expected behavior

New values should be appended to the existing file.

Installed versions

arrow_ipc = ["io_ipc"]

@Jonarod Jonarod added bug Something isn't working needs triage Awaiting prioritization by a maintainer rust Related to Rust Polars labels Jan 16, 2024
@ritchie46
Copy link
Member

I am not sure we ever intended to allow appending to IPC?

@Jonarod
Copy link
Author

Jonarod commented Jan 16, 2024

Oh... Sorry for that, I got confused because of this definition of polars_arrow::ipc::write::FileWriter::try_from_file():

Creates a new FileWriter from an existing file, seeking to the last message and appending new messages afterwards. Users call finish to write the footer (with both) the existing and appended messages on it.

Also in the source, the implementation folder is called append, see here: crates/polars-arrow/src/io/ipc/append

What is this implementation for then exactly?

@ritchie46
Copy link
Member

Hmm.. this is forked from arrow2. Does it work there?

@Jonarod
Copy link
Author

Jonarod commented Jan 16, 2024

No, looks the same with arrow2.
However just as a side note: the .try_from_file() does actually append data to the file, and correctly appends correct metadata to it, going from 1 "block" to 2 "blocks" with correct update of the total bytes as well as bytes offsets (see below). So my guess is that this function is actually to try to append data to ipc. However, for some reasons, it seems like the data itself is actually not written properly somehow.
But yeah. I will close this from here and rather post the issue in arrow2.
Thanks!

Before:

FileMetadata {
    schema: Schema {
        fields: [
            Field {
                name: "col",
                data_type: Float64,
                is_nullable: false,
                metadata: {},
            },
        ],
        metadata: {},
    },
    ipc_schema: IpcSchema {
        fields: [
            IpcField {
                fields: [],
                dictionary_id: None,
            },
        ],
        is_little_endian: true,
    },
    blocks: [
        Block {
            offset: 128,
            meta_data_length: 144,
            body_length: 64,
        },
    ],
    dictionaries: Some(
        [],
    ),
    size: 514,
}

After:

FileMetadata {
    schema: Schema {
        fields: [
            Field {
                name: "col",
                data_type: Float64,
                is_nullable: false,
                metadata: {},
            },
        ],
        metadata: {},
    },
    ipc_schema: IpcSchema {
        fields: [
            IpcField {
                fields: [],
                dictionary_id: None,
            },
        ],
        is_little_endian: true,
    },
    blocks: [
        Block {
            offset: 128,
            meta_data_length: 144,
            body_length: 64,
        },
        Block {
            offset: 336,
            meta_data_length: 144,
            body_length: 64,
        },
    ],
    dictionaries: Some(
        [],
    ),
    size: 924,
}

@Jonarod
Copy link
Author

Jonarod commented Jan 16, 2024

Issue reported in arrow2 directly here: #1604

@Jonarod Jonarod closed this as completed Jan 16, 2024
@stinodego stinodego removed the needs triage Awaiting prioritization by a maintainer label Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rust Related to Rust Polars
Projects
None yet
Development

No branches or pull requests

3 participants