Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Fixed using lower limit than size of first parquet row group #1046

Merged
merged 3 commits into from
Jun 5, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions src/io/parquet/read/file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,20 +80,21 @@ impl<R: Read + Seek> FileReader<R> {
metadata: schema_metadata,
};

let row_groups = RowGroupReader::new(
let mut row_groups = RowGroupReader::new(
reader,
schema,
groups_filter,
metadata.row_groups.clone(),
chunk_size,
limit,
);
let current_row_group = row_groups.next().transpose()?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider something different here - this causes try_new to be O(N) since it advances the iterator.


Ok(Self {
row_groups,
metadata,
remaining_rows: limit.unwrap_or(usize::MAX),
current_row_group: None,
current_row_group,
})
}

Expand Down
5 changes: 3 additions & 2 deletions src/io/parquet/read/row_group.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,11 @@ impl Iterator for RowGroupDeserializer {
})
.collect::<Result<Vec<_>>>()
.map(Chunk::new);
self.remaining_rows -= chunk
self.remaining_rows = self.remaining_rows.saturating_sub(chunk
.as_ref()
.map(|x| x.len())
.unwrap_or(self.remaining_rows);
.unwrap_or(self.remaining_rows)
);

Some(chunk)
}
Expand Down