Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet2::write::stream::FileStreamer::end does not flush Parquet footer #162

Closed
v0y4g3r opened this issue Jul 26, 2022 · 1 comment · Fixed by #163
Closed

parquet2::write::stream::FileStreamer::end does not flush Parquet footer #162

v0y4g3r opened this issue Jul 26, 2022 · 1 comment · Fixed by #163
Labels
bug Something isn't working no-changelog

Comments

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Jul 26, 2022

When parquet2::write::stream::FileStreamer::end is invoked, it calls end_file to write some metadata into Parquet footer along with Parquet magic PAR1.

But after writing the magic, end_file does not call writer's flush method to ensure magic is actually written to underlying file, which in some scenario may result into a corrupted Parquet file when read from a Parquet file immediately after write to it, because FileStreamer::end seems to indicate all works that writes data to file stream are done.

async fn end_file<W: AsyncWrite + Unpin + Send>(
    mut writer: &mut W,
    metadata: FileMetaData,
) -> Result<u64> {
    // Write file metadata
    let mut protocol = TCompactOutputStreamProtocol::new(&mut writer);
    let metadata_len = metadata.write_to_out_stream_protocol(&mut protocol).await? as i32;
    protocol.flush().await?; // metadata is flushed

    // Write footer
    let metadata_bytes = metadata_len.to_le_bytes();
    let mut footer_buffer = [0u8; FOOTER_SIZE as usize];
    (0..4).for_each(|i| {
        footer_buffer[i] = metadata_bytes[i];
    });

    (&mut footer_buffer[4..]).write_all(&PARQUET_MAGIC)?;
    writer.write_all(&footer_buffer).await?; // but not the parquet magic
    Ok(metadata_len as u64 + FOOTER_SIZE) 
}

If it's not something "by design", I'd be happy to open a PR to fix it.

@jorgecarleitao jorgecarleitao added the bug Something isn't working label Jul 26, 2022
@jorgecarleitao
Copy link
Owner

Thanks a lot for reporting it. Yeap, this a bug. :/ Thanks a lot for offering for PRing - let's get that in and I will cut a patch release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants