51.0.0 (2024-03-15)
Breaking changes:
- Remove internal buffering from AsyncArrowWriter (#5484) #5485 [parquet] (tustvold)
- Make ArrayBuilder also Sync #5353 [arrow] (dvic)
- Raw JSON writer (~10x faster) (#5314) #5318 [arrow] (tustvold)
Implemented enhancements:
- Prototype Arrow over HTTP in Rust #5496 [arrow]
- Add DataType::ListView and DataType::LargeListView #5492 [parquet] [arrow]
- Improve documentation around handling of dictionary arrays in arrow flight #5487 [arrow] [arrow-flight]
- Better memory limiting in parquet
ArrowWriter
#5484 [parquet] - Support Creating Non-Nullable Lists and Maps within a Struct #5482 [arrow]
- [DISCUSSION] Better borrow propagation (e.g.
RecordBatch::schema()
to return&SchemaRef
vsSchemaRef
) #5463 [parquet] [arrow] [arrow-flight] - Build Scalar with ArrayRef #5459
- AsyncArrowWriter doesn't limit underlying ArrowWriter to respect buffer-size #5450 [parquet]
- Refine
Display
implementation forFlightError
#5438 [arrow] [arrow-flight] - Better ergonomics for
FixedSizeList
andLargeList
#5372 [arrow] - Update Flight proto #5367 [arrow] [arrow-flight]
- Support check similar datatype but with different magnitudes #5358 [arrow]
- Buffer memory usage for custom allocations is reported as 0 #5346 [arrow]
- Can the ArrayBuilder trait be made Sync? #5344 [arrow]
- support cast 'UTF8' to
FixedSizeList
#5339 [arrow] - Support Creating Non-Nullable Lists with ListBuilder #5330 [arrow]
ParquetRecordBatchStreamBuilder::new()
panics instead of erroring out when opening a corrupted file #5315 [parquet]- Raw JSON Writer #5314 [arrow]
- Add support for more fused boolean operations #5297 [arrow]
- parquet: Allow disabling embed
ARROW_SCHEMA_META_KEY
added by theArrowWriter
#5296 [parquet] - Support casting strings like '2001-01-01 01:01:01' to Date32 #5280 [arrow]
- Temporal Extract/Date Part Kernel #5266 [arrow]
- Support for extracting hours/minutes/seconds/etc. from
Time32
/Time64
type in temporal kernels #5261 [arrow] - parquet: add method to get both the inner writer and the file metadata when closing SerializedFileWriter #5253 [parquet]
- Release arrow-rs version 50.0.0 #5234
Fixed bugs:
- Empty String Parses as Zero in Unreleased Arrow #5504 [arrow]
- Unused import in nightly rust #5476 [parquet] [arrow] [arrow-flight]
- Error
The data type type List .. has no natural order
when usingarrow::compute::lexsort_to_indices
with list and more than one column #5454 [arrow] - Wrong size assertion in arrow_buffer::builder::NullBufferBuilder::new_from_buffer #5445 [arrow]
- Inconsistency between comments and code implementation #5430 [arrow]
- OOB access in
Buffer::from_iter
#5412 [arrow] - Cast kernel doesn't return null for string to integral cases when overflowing under safe option enabled #5397 [arrow]
- Make ffi consume variable layout arrays with empty offsets #5391 [arrow]
- RecordBatch conversion from pyarrow loses Schema's metadata #5354 [arrow]
- Debug output of Time32/Time64 arrays with invalid values has confusing nulls #5336 [arrow]
- Removing a column from a
RecordBatch
drops schema metadata #5327 [arrow] - Panic when read an empty parquet file #5304 [parquet]
- How to enable statistics for string columns? #5270 [parquet]
concat::tests::test_string_dictionary_merge failure
fails on Mac / has different results in different platforms #5255 [arrow]
Documentation updates:
- Minor: Add doc comments to
GenericByteViewArray
#5512 [arrow] (alamb) - Improve docs for logical and physical nulls even more #5434 [arrow] (alamb)
- Add example of converting RecordBatches to JSON objects #5364 [arrow] (alamb)
Performance improvements:
Closed issues:
- Add
StringViewArray
implementation and layout and basic construction + tests #5469 [parquet] [arrow] - Add
DataType::Utf8View
andDataType::BinaryView
#5468 [parquet] [arrow]
Merged pull requests:
- Deprecate array_to_json_array #5515 [arrow] (tustvold)
- Fix integer parsing of empty strings (#5504) #5505 [arrow] (tustvold)
- feat: clarifying comments in struct_builder.rs #5494 #5499 [arrow] (istvan-fodor)
- Update proc-macro2 requirement from =1.0.78 to =1.0.79 #5498 [arrow] [arrow-flight] (dependabot[bot])
- Add DataType::ListView and DataType::LargeListView #5493 [parquet] [arrow] (Kikkon)
- Better document parquet pushdown #5491 [parquet] (tustvold)
- Fix NullBufferBuilder::new_from_buffer wrong size assertion #5489 [arrow] (Kikkon)
- Support dictionary encoding in structures for
FlightDataEncoder
, add documentation forarrow_flight::encode::Dictionary
#5488 [arrow] [arrow-flight] (thinkharderdev) - Add MapBuilder::with_values_field to support non-nullable values (#5482) #5483 [arrow] (lasantosr)
- feat: initial support string_view and binary_view, supports layout and basic construction + tests #5481 [arrow] (ariesdevil)
- Add more comprehensive documentation on testing and benchmarking to CONTRIBUTING.md #5478 (monkwire)
- Remove unused import detected by nightly rust #5477 [parquet] [arrow] [arrow-flight] (XiangpengHao)
- Add RecordBatch::schema_ref #5474 [parquet] [arrow] [arrow-flight] (monkwire)
- Provide access to inner Write for parquet writers #5471 [parquet] (tustvold)
- Add DataType::Utf8View and DataType::BinaryView #5470 [parquet] [arrow] (XiangpengHao)
- Update base64 requirement from 0.21 to 0.22 #5467 [parquet] [arrow] [arrow-flight] (dependabot[bot])
- Minor: Fix formatting typo in
Field::new_list_field
#5464 [arrow] (alamb) - Fix test_string_dictionary_merge (#5255) #5461 [arrow] (tustvold)
- Use Vec::from_iter in Buffer::from_iter #5460 [arrow] (Kikkon)
- Document parquet writer memory limiting (#5450) #5457 [parquet] (tustvold)
- Document UnionArray Panics #5456 [arrow] (Kikkon)
- fix: lexsort_to_indices unsupported mixed types with list #5455 [arrow] (alamb)
- Refine
Display
andSource
implementation for error types #5439 [arrow] [arrow-flight] (BugenZhao) - Improve debug output of Time32/Time64 arrays #5428 [arrow] (monkwire)
- Miri fix: Rename invalid_mut to without_provenance_mut #5418 [arrow] (Jefffrey)
- Ensure addition/multiplications in when allocating buffers don't overflow #5417 [arrow] (Jefffrey)
- Update Flight proto: PollFlightInfo & expiration time #5413 [arrow] [arrow-flight] (Jefffrey)
- Add tests for serializing lists of dictionary encoded values to json #5399 [arrow] (jhorstmann)
- Return null for overflow when casting string to integer under safe option enabled #5398 [arrow] (viirya)
- Propagate error instead of panic for
take_bytes
#5395 [arrow] (viirya) - Improve like kernel by ~2% #5390 [arrow] (psvri)
- Enable running arrow-array and arrow-arith with miri and avoid strict provenance warning #5387 [arrow] (jhorstmann)
- Update to chrono 0.4.34 #5385 [arrow] (tustvold)
- Return error instead of panic when reading invalid Parquet metadata #5382 [parquet] (mmaitre314)
- Update tonic requirement from 0.10.0 to 0.11.0 #5380 [arrow] [arrow-flight] (dependabot[bot])
- Update tonic-build requirement from =0.10.2 to =0.11.0 #5379 [arrow] [arrow-flight] (dependabot[bot])
- Fix latest clippy lints #5376 [arrow] (tustvold)
- feat: utility functions for creating
FixedSizeList
andLargeList
dtypes #5373 [arrow] (universalmind303) - Minor(docs): update master to main for DataFusion/Ballista #5363 (caicancai)
- Return an error instead of a panic when reading a corrupted Parquet file with mismatched column counts #5362 [parquet] (mmaitre314)
- feat: support casting FixedSizeList with new child type #5360 [arrow] (wjones127)
- Add more debugging info to StructBuilder validate_content #5357 [arrow] (viirya)
- pyarrow: Preserve RecordBatch's schema metadata #5355 [arrow] (atwam)
- Mark Encoding::BIT_PACKED as deprecated and document its compatibility issues #5348 [parquet] (jhorstmann)
- Track the size of custom allocations for use via Array::get_buffer_memory_size #5347 [arrow] (jhorstmann)
- fix: Return an error on type mismatch rather than panic (#4995) #5341 [parquet] (carols10cents)
- Minor: support cast values to fixedsizelist #5340 [arrow] (Weijun-H)
- Enhance Time32/Time64 support in date_part #5337 [arrow] (Jefffrey)
- feat: add
take_record_batch
. #5333 [arrow] (RinChanNOWWW) - Add ListBuilder::with_field to support non nullable list fields (#5330) #5331 [arrow] (tustvold)
- Don't omit schema metadata when removing column #5328 [arrow] (kylebarron)
- Update proc-macro2 requirement from =1.0.76 to =1.0.78 #5324 [arrow] [arrow-flight] (dependabot[bot])
- Enhance Date64 type documentation #5323 [arrow] (Jefffrey)
- fix panic when decode a group with no child #5322 [parquet] (Liyixin95)
- Minor/Doc Expand FlightSqlServiceClient::handshake doc #5321 [arrow] [arrow-flight] (devinjdangelo)
- Refactor temporal extract date part kernels #5319 [arrow] (Jefffrey)
- Add JSON writer benchmarks (#5314) #5317 [arrow] (tustvold)
- Bump actions/cache from 3 to 4 #5308 (dependabot[bot])
- Avro block decompression #5306 [arrow] (tustvold)
- Result into error in case of endianness mismatches #5301 [arrow] (pangiole)
- parquet: Add ArrowWriterOptions to skip embedding the arrow metadata #5299 [parquet] (evenyag)
- Add support for more fused boolean operations #5298 [arrow] (RTEnzyme)
- Support Parquet Byte Stream Split Encoding #5293 [parquet] (mwlon)
- Extend string parsing support for Date32 #5282 [arrow] (gruuya)
- Bring some methods over from ArrowWriter to the async version #5251 [parquet] (AdamGS)
* This Changelog was automatically generated by github_changelog_generator