Skip to content

Commit

Permalink
fix: ensure the checkpoint decoder is regularly flushed
Browse files Browse the repository at this point in the history
For checkpoint buffers that cannot fit into the batch size the
checkpoint will be written with an insufficient number of bytes.

Unfortunately our tests didn't catch this and it only manifested on
tables with very large transaction logs

Signed-off-by: R. Tyler Croy <[email protected]>
  • Loading branch information
rtyler committed Oct 25, 2024
1 parent 5b99fcd commit 264a0ec
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions crates/core/src/protocol/checkpoints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -356,11 +356,12 @@ fn parquet_bytes_from_state(
for j in jsons {
let buf = serde_json::to_string(&j?).unwrap();
let _ = decoder.decode(buf.as_bytes())?;

while let Some(batch) = decoder.flush()? {
writer.write(&batch)?;
}
total_actions += 1;
}
while let Some(batch) = decoder.flush()? {
writer.write(&batch)?;
}

let _ = writer.close()?;
debug!("Finished writing checkpoint parquet buffer.");
Expand Down

0 comments on commit 264a0ec

Please sign in to comment.