Skip to content

Commit

Permalink
PARQUET-418: Refactored parquet_reader utility for printing file cont…
Browse files Browse the repository at this point in the history
…ents.

This pull request contains the following changes:
* Modified parquet_reader utility: refactored, fixed memory leaks, merged compute_stats utility to get rid of code duplication.
* Added a flag --only-stats to parquet_reader to print only file statistics.
* Modified InMemoryInputStream to own its buffer.

All the code repetition still remaining in parquet_reader clearly highlights the need for specialized ColumnReader classes. I will create a new JIRA for this improvement.

Author: Aliaksei Sandryhaila <[email protected]>

Closes apache#18 from asandryh/PARQUET-418 and squashes the following commits:

a378a1e [Aliaksei Sandryhaila] Changed the buffer in ScopedInMemoryInputStream to std::vector.
7f6f533 [Aliaksei Sandryhaila] [PARQUET-418]: Added/modified a utility for printing a file contents.
  • Loading branch information
Aliaksei Sandryhaila authored and wesm committed Sep 2, 2018
1 parent 9143357 commit dd778ff
Showing 1 changed file with 14 additions and 0 deletions.
14 changes: 14 additions & 0 deletions cpp/src/parquet/parquet.h
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,20 @@ class InMemoryInputStream : public InputStream {
int64_t offset_;
};

// A wrapper for InMemoryInputStream to manage the memory.
class ScopedInMemoryInputStream : public InputStream {
public:
ScopedInMemoryInputStream(int64_t len);
uint8_t* data();
int64_t size();
virtual const uint8_t* Peek(int num_to_peek, int* num_bytes);
virtual const uint8_t* Read(int num_to_read, int* num_bytes);

private:
std::vector<uint8_t> buffer_;
std::unique_ptr<InMemoryInputStream> stream_;
};

// API to read values from a single column. This is the main client facing API.
class ColumnReader {
public:
Expand Down

0 comments on commit dd778ff

Please sign in to comment.