Skip to content

Commit

Permalink
ARROW-12007: [C++] Loading parquet file returns "Invalid UTF8 payload…
Browse files Browse the repository at this point in the history
…" error

Judging from the comment "avoid spending time validating UTF8 data" with the setting of the false value to the cast_options.allow_invalid_utf8, it seems to me this was intended to be true rather than false.

Also, this resolved the error I was getting through the arrow R package, which seems to be ARROW-12007.

Closes apache#10759 from hideaki/cancel_unnecessary_utf8_check

Authored-by: Hideaki Hayashi <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
  • Loading branch information
hideaki authored and pull[bot] committed Jan 14, 2022
1 parent 2fda73e commit a37836e
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion cpp/src/parquet/arrow/reader_internal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,7 @@ Status TransferBinary(RecordReader* reader, MemoryPool* pool,
}
::arrow::compute::ExecContext ctx(pool);
::arrow::compute::CastOptions cast_options;
cast_options.allow_invalid_utf8 = false; // avoid spending time validating UTF8 data
cast_options.allow_invalid_utf8 = true; // avoid spending time validating UTF8 data

auto binary_reader = dynamic_cast<BinaryRecordReader*>(reader);
DCHECK(binary_reader);
Expand Down

0 comments on commit a37836e

Please sign in to comment.