Releases: jorgecarleitao/arrow2
v0.10.0
Arrow2 0.10.0 is out! 🚀🚀🚀🚀🚀
Continuing breaking ground, this constitutes one of the most feature rich releases of this crate so far!
Thank you to everyone for the impressive work over the past 2.5 months that make arrow2 so feature rich, safe, fast, and easy to use! 🙇
Here are the main headlines:
Copy on Write
So far, whenever we applied a transformation to an array, we had to create a new array. When multiple operations were used (e.g. c1 x 2 + 1
), it lead to the following compute pattern:
1. allocate new region
2. compute
3. allocate new region
4. compute
This was identified by @sundy-li on #741 and addressed by @ritchie46 on #794.
Users can now re-use Arc
ed arrays, just like std::sync::Arc::get_mut
. As expected, if the array is being used in multiple places, it will return a None
and users do need to allocate a new region (exclusive mutability).
This is being used in Polars to further re-use allocated regions and therefore reduce both memory pressure and wasted compute cycles allocating new regions.
Support for ODBC
This release now supports reading from, and write to, any ODBC driver.
This builds on top of the superb odbc-api created by @pacman82, that allows this crate to use the columnar format provided by ODBC specification.
Given a performant ODBC driver, this is expected to be the fastest way to load data to the Arrow format, as many operations are simple memcopies.
Check out the example and guide for details on how to use it!
async
support for writing to Arrow's IPC
Until now, we had limited support to writing to Arrow IPC asynchronously. @dexterduck closed this gap on #878, offering complete async
support for both Arrow files and Arrow streams, including implementations of futures::Stream
and futures::Sink
for them!
Migrated std::simd
After some back and forth with the working group of the project portable simd, this release replaces packed_simd2
by std::simd
. This resulted in no performance difference but allow us to leverage the great work that is happening on std::simd
.
Support to Serde metadata
A common pain point in using arrow2's logical types is that they are quite rich, making them sometimes difficult
to visualize or represent in e.g. JSON. @houqp closed this with #858, that adds compatibility with Serde for
schema-related structs in this crate (PhysicalType
DataType
, Field
, Schema
).
Support for Arrow C stream interface
Arrow has an experimental specification for an FFI to iterators of arrow arrays. This release now fully supports this interface.
Made crate deny(missing_docs)
This makes us developers more conscious about documenting APIs, thereby allowing users more context about them. We have also start documenting IO-related APIs over whether they are CPU or IO-bounded, so that users know which ones block async
contexts.
Changelog
Breaking changes:
- Renamed
Ffi_ArrowArray
andFfi_ArrowSchema
#859 - Improved performance and stability of writing to CSV #866 (ritchie46)
- Simplified API for writing to JSON #864 (jorgecarleitao)
- Simplified API to import from FFI #854 (jorgecarleitao)
- Simplified compute (lower/upper) #847 (jorgecarleitao)
- Simplified infering arrow schema from a parquet schema #819 (jorgecarleitao)
- Bumped parquet and aligned API to fit into it #795 (jorgecarleitao)
New features:
- Added
GrowableUnion
#902 (jorgecarleitao) - Added cast to
months_days_ns
#900 (jorgecarleitao) - Added support for
hash
ofmonth_day_ns
arrays #899 (jorgecarleitao) - IPC sink types and IPC file stream #878 (dexterduck)
- implemented
futures::Sink
for parquet async writer #877 (dexterduck) - Added
try_new
andnew
to all arrays #873 (jorgecarleitao) - Added support for datatypes serde #858 (houqp)
- Added support to the Arrow C stream interface (read and write) #857 (jorgecarleitao)
- Support to read/write from/to ODBC #849 (jorgecarleitao)
- Added operators that include validities in comparisons #846 (ritchie46)
- Added support to read and write
Decimal128
to Avro #837 (potter420) - Added support to read Arrow streams asynchronously #832 (jorgecarleitao)
- Added support to write
LargeUtf8
andLargeBinary
to Avro #828 (illumination-k) - Added support for pushdown projection in reading Avro #827 (jorgecarleitao)
- Added support to read Avro's structs #826 (jorgecarleitao)
- Added support to write largeUtf8/Binary to Avro #825 (jorgecarleitao)
- Added json serialization of timestamp/date32/date64 #814 (ritchie46)
- Added
BooleanArray::from_trusted_len_values_iter_unchecked
#799 (ritchie46) - Added
MutableUtf8Array::extend_values
#798 (ritchie46) - Added COW semantics to
Buffer
,Bitmap
and some arrays #794 (ritchie46) - Added support to read parquet row groups in chunks #789 (jorgecarleitao)
- Added scalar bitwise ops #788 (jorgecarleitao)
- Migrated to portable simd #747 (jorgecarleitao)
Fixed bugs:
- Fixed edge case in reading multiple parquet pages #904 (jorgecarleitao)
- Bug fix in offset for sliced unions #891 (ncpenke)
- Fix edge case in reading nested parquet #884 (jorgecarleitao)
- Fixed unsoundness of
#derive(Clone)
for FFI structs #882 (jorgecarleitao) - Fixed json writing of dates and datetimes #867 (jorgecarleitao)
- Fixed reading parquet with timezone #862 (jorgecarleitao)
- Fixed error in writing compressed IPC arrow #855 (jorgecarleitao)
- Fixed wrong null_count when slicing a sliced Bitmap #848 (satlank)
- Fixed error in writing compressed IPC files #840 (jorgecarleitao)
- Fixed float to i128 cast #817 (houqp)
- fix unescaped '"' in json writing #812 (ritchie46)
- Fixed reading parquet binary dict page #791 (danburkert)
Enhancements:
- Add
FixedSizeBinaryScalar
#782 - Use more idiomatic versions #898 (jorgecarleitao)
- Added support for min/max for decimal [#897](https://github...
v0.9.0
A new release is here! 🎉🎉🎉🎉 This release has four major improvements:
- It is now backed by std's
Vec
, thus making it- zero-copy with the rest of Rust's ecosystem
- use less
unsafe
- more ergonomics
- faster to compile
- (no difference in performance)
- It now supports reading from, and writing to, Apache Avro, both
sync
andasync
- flatbuffers dependency was replaced by
planus
, a re-implementation of the flatbuffers specification in Rust (you should check out that project, awesome work by @kristoff3r and @TethysSvensson)- lower risks of
unsound
- easier-to-maintain code base
- lower risks of
- Improved security and general maintenance:
- Made most of the crate
#[forbid(unsafe)]
- significantly reduced the use of
unsafe
viabytemuck
's dependency - made most of parsing of Arrow IPC
panic
-free, to reduce risks of DOS from untrusted data
- Made most of the crate
A big thanks to all contributors (listed below) and our users for all the dedication, hard work, and patience. 🙇
Breaking changes:
- Added number of rows read in CSV inference #765 (jorgecarleitao)
- Refactored
nullif
#753 (jorgecarleitao) - Migrated to latest parquet2 #752 (jorgecarleitao)
- Replace flatbuffers dependency by Planus #732 (jorgecarleitao)
- Simplified
Schema
andField
#728 (jorgecarleitao) - Replaced
RecordBatch
byChunk
#717 (jorgecarleitao) - Removed
Option
from fields' metadata #715 (jorgecarleitao) - Moved dict_id to IPC-specific IO #713 (jorgecarleitao)
- Moved is_ordered from
Field
toDataType::Dictionary
#711 (jorgecarleitao) - Refactored JSON writing (5-10x) #709 (jorgecarleitao)
- Made Avro read API use
Block
andCompressedBlock
#698 (jorgecarleitao) - Simplified most traits #696 (jorgecarleitao)
- Replaced
Display
byDebug
forArray
#694 (jorgecarleitao) - Replaced
MutableBuffer
bystd::Vec
#693 (jorgecarleitao) - Simplified
Utf8Scalar
andBinaryScalar
#660 (jorgecarleitao) - Simplified Primitive and Boolean scalar #648 (jorgecarleitao)
New features:
- Add
and_scalar
andor_scalar
for boolean_kleene #662 - Add
lower
andupper
support for string #635 - Added support to cast decimal #761 (jorgecarleitao)
- Added support to deserialize JSON (!= NDJSON) #758 (jorgecarleitao)
- Added support to infer nested json structs #750 (jorgecarleitao)
- Added support to compare intervals #746 (jorgecarleitao)
- Added
any
andall
kernel #739 (ritchie46) - Added support to write Avro async #736 (jorgecarleitao)
- Added support to write interval to Avro #734 (jorgecarleitao)
- Added
and_scalar
andor_scalar
for boolean kleene #723 (silathdiir) - Added
and_scalar
andor_scalar
for boolean #707 (silathdiir) - Refactored JSON read to split IO-bounded from CPU-bounded tasks #706 (jorgecarleitao)
- Added more conversions from parquet #701 (jorgecarleitao)
- Added support for compressed Avro write #699 (jorgecarleitao)
- Added support to write to Avro #690 (jorgecarleitao)
- Added dynamic version of negation #685 (jorgecarleitao)
- Added support to read dictionary-encoded required parquet pages #683 (mdrach)
- Added
upper
#664 (Xuanwo) - Added
lower
#641 (Xuanwo) - Added support for
async
read of Avro #620 (jorgecarleitao)
Fixed bugs:
- Pyarrow and Arrow2 don't agree on Timestamp resolution #700
- Writing compressed dictionary in parquet corrupts the files #667
- Replaced assert by error in IPC read #748 (jorgecarleitao)
- Made all panics in IPC read errors #722 (jorgecarleitao)
- Fixed error in compare booleans #721 (jorgecarleitao)
- Fixed error in dispatching scalar arithmetics #682 (jorgecarleitao)
- Fixed error in reading negative decimals from parquet #679 (mdrach)
- Made IPC reader less restrictive #678 (jorgecarleitao)
- Fixed error in trait constraint in compute #665 (jorgecarleitao)
- Fixed performance regression of CSV reading #657 (jorgecarleitao)
- Fixed filter of predicate with validity #653 (ritchie46)
- Made
Scalar: Send+Sync
#644 (jorgecarleitao)
Enhancements:
- Feature: JSON IO? #712
- Simplified code #760 (jorgecarleitao)
- Added iterator of values of
FixedBinaryArray
#757 (jorgecarleitao) - Remove un-needed
unsafe
#756 (jorgecarleitao) - Replaced un-needed
unsafe
#755 (jorgecarleitao) - Made IO
#[forbid(unsafe)]
#749 (jorgecarleitao) - Improved reading nullable Avro arrays #727 (Igosuki)
- Allow to create primitive array by vec without extra memcopy #710 (sundy-li)
- Removed requirement of
use Array
to access primitives'data_type
#697 (jorgecarleitao) - Cleaned up trait usage and added forbid_unsafe to parts #695 (jorgecarleitao)
- Migrated from
avro-rs
toavro-schema
#692 (jorgecarleitao) - Added
MutablePrimitiveArray::extend_constant
[#6...
v0.8.0
A new release is here 🚀🚀🚀
This release has so many important new features and bug fixes that will be summarized as: thank you everyone for all the issues and PRs that resulted in this release (in order of appearance) 🙇🙇🙇🙇:
Breaking changes:
- Made CSV write options use chrono formatting by default #624
- Add
compression
toIpcWriteOptions
#570 - Made
cast
acceptCastOptions
parameter #569 - Simplified
ArrowError
#640 (jorgecarleitao) - Use
DynComparator
forlexsort
andpartition
#637 (yjshen) - Split "compute" feature #634 (jorgecarleitao)
- Removed unneeded trait. #628 (jorgecarleitao)
- Sealed 2 traits to forbid downstream implementations #621 (jorgecarleitao)
- Simplified arithmetics compute #607 (jorgecarleitao)
- Refactored comparison
Operator
#604 (jorgecarleitao) - Simplified dictionary indexes #584 (jorgecarleitao)
- Simplified IPC APIs #576 (jorgecarleitao)
- Simplified IPC stream writer / remove finish on drop from stream writer #575 (jorgecarleitao)
- Simplified trait in compute. #572 (jorgecarleitao)
- Compute: add partial option into CastOptions #561 (sundy-li)
- Introduced
UnionMode
enum #557 (simonvandel) - Changed DataType::FixedSize*(i32) to DataType::FixedSize*(usize) #556 (simonvandel)
New features:
- Added support to write timestamps with timezones for CSV #623 (jorgecarleitao)
- Added support to read Avro files' metadata asynchronously #614 (jorgecarleitao)
- Added iterator for
StructArray
#613 (illumination-k) - Added support to read snappy-compressed Avro #612 (jorgecarleitao)
- Added support to read decimal from csv #602 (jorgecarleitao)
- Added support to cast
NullArray
to all other types #589 (flaneur2020) - Added support dictionaries in nested types over IPC #587 (jorgecarleitao)
- Added support to write Arrow IPC streams asynchronously #577 (jorgecarleitao)
- Added support to write compressed Arrow IPC (feather v2) #566 (jorgecarleitao)
- Added support for ffi for
FixedSizeList
andFixedSizeBinary
#565 (jorgecarleitao) - Added support for
async
csv reading. #562 (jorgecarleitao) - Added support for
bitwise
operations #553 (1aguna) - Added support to read
StructArray
from parquet #547 (jorgecarleitao)
Fixed bugs:
- Fixed error in reading nullable from Avro. #631 (jorgecarleitao)
- Fixed error in union FFI #625 (jorgecarleitao)
- Fixed error in computing projection in
io::ipc::read::reader::FileReader
#596 (illumination-k) - Fixed error in compressing IPC LZ4 #593 (jorgecarleitao)
- Fixed growable of dictionaries negative keys #582 (ritchie46)
- Made substring kernel on utf8 take chars into account. #568 (ritchie46)
- Fixed error in passing sliced arrays via FFI #564 (jorgecarleitao)
Enhancements:
- Faster
take
with null values (2-3x) #633 (jorgecarleitao) - Improved error message for missing feature in compressed parquet #632 (jorgecarleitao)
- Added
to
conversion toFixedSizeBinary
#622 (ritchie46) - Bumped
confy-table
#618 (jorgecarleitao) - Made
MutableArray
Send + Sync
#617 (jorgecarleitao) - Removed most of allocations in IPC reading #611 (jorgecarleitao)
- Speed up boolean comparison kernels (~3x) #610 (Dandandan)
- Improved performance of decimal arithmetics #605 (jorgecarleitao)
- Simplified traits and added documentation #603 (jorgecarleitao)
- Improved performance of
is_not_null
. #600 (jorgecarleitao) - Added
len
to every array #599 (jorgecarleitao) - Added support for
NullArray
at FFI. #598 (jorgecarleitao) - Optimized
MutableBinaryArray
#597 (jorgecarleitao) - Speedup/simplify bitwise operations (avoid extra allocation) #586 (Dandandan)
- Improved performance of
bitmap::from_trusted
(3x) #578 (jorgecarleitao) - Made bitmap not cache null count #563 (jorgecarleitao)
- Avoided redundant checks in creating an
Utf8Array
fromMutableUtf8Array
#560 (jorgecarleitao) - Avoid unnecessary allocations #559 (simonvandel)
- Surfaced errors in reading from avro #558 (jorgecarleitao)
Documentation updates:
- Simplified example #619 (jorgecarleitao)
- Made example of parallel parquet write be over multiple batches #544 (jorgecarleitao)
Testing updates:
- Cleaned up benches #636 (jorgecarleitao)
- Ignor...
v0.7.0
Another release is here 🚀🚀🚀
As usual, a bunch of optimizations as well as some work in two main fronts:
- make the crate smaller and easier to compile
- support for nested parquet reads
Thank you to all contributors (names below) for the amazing contributions!
Breaking changes:
- Simplified reading parquet #532 (jorgecarleitao)
- Change IPC
FileReader
to own the underlying reader #518 (blakesmith) - Migrate to
arrow_format
crate #517 (jorgecarleitao)
New features:
- Added read of 2-level nested lists from parquet #548 (jorgecarleitao)
- add dictionary serialization for csv-writer #515 (ritchie46)
- Added
checked_negate
andwrapping_negate
forPrimitiveArray
#506 (yjhmelody)
Fixed bugs:
- Fixed error in reading fixed len binary from parquet #549 (jorgecarleitao)
- Fixed ffi of sliced arrays #540 (jorgecarleitao)
- Fixed s3 example #536 (jorgecarleitao)
- Fixed error in writing compressed parquet dict pages #523 (jorgecarleitao)
- Validity taken into account when writing
StructArray
to json #511 (VasanthakumarV)
Enhancements:
- Bumped Prost and Tonic #550 (PsiACE)
- Speedup scalar boolean operations #546 (Dandandan)
- Added fast path for validating ASCII text (~1.12-1.89x improvement on reading ASCII parquet data) #542 (Dandandan)
- Exposed missing APIs to write parquet in parallel #539 (jorgecarleitao)
- improve utf8 init validity #530 (ritchie46)
- export missing
BinaryValueIter
#526 (yjhmelody)
Documentation updates:
- Added more IPC documentation #534 (HagaiHargil)
- Fixed clippy and fmt #521 (ritchie46)
Testing updates:
- Added more tests for
utf8
#543 (jorgecarleitao) - Ignored RUSTSEC-2020-0071 and RUSTSEC-2020-0159 #537 (jorgecarleitao)
- Improved parquet read benches #533 (jorgecarleitao)
- Added fmt and clippy checks to CI. #522 (xudong963)
v0.6.2
Small release with two minor but relevant bug fixes and a new feature.
New features:
Fixed bugs:
- Do not check offsets or utf8 validity in ffi (#505) #510 (NilsBarlaug)
- Made
try_push_valid
public again #509 (ritchie46)
Enhancements:
v0.6.0
(in crates as 0.6.1: I made a mistake in publishing). Anyways, another big release is here!
There are just too many improvements for a 22 days release - let's try to capture important mentions:
Buffer
andMutableBuffer
are now compatible with Rust'sstd::Vec
with no strings attached: everything continues to work, including FFI with the rest of the ecosystem! You can recover the previous behavior (of using cached-aligned allocations), via featurecache_aligned
- Added broad support to timestamp with timezones. Kudos to @VasanthakumarV for all the help.
- Added read Decimal from parquet. Kudos to @potter420 for the contribution.
- More improvements to performance. Kudos to @Dandandan and @ritchie46.
- Support to read from the Avro via feature
io_avro
Breaking changes:
- Bring
MutableFixedSizeListArray
to the spec used by the rest of the Mutable API #475 - Removed
ALIGNMENT
invariant from[Mutable]Buffer
#449 - Un-nested
compute::arithemtics::basic
#461 (jorgecarleitao) - Added more serialization options for csv writer. #453 (ritchie46)
- Changed validity from
&Option<Bitmap>
toOption<&Bitmap>
. #431 (jorgecarleitao) - Bumped parquet2 #422 (jorgecarleitao)
- Changed IPC
FileWriter
to own thewriter
. #420 (yjshen) - Made
DynComparator
Send+Sync
#414 (yjshen)
New features:
- Read Decimal from Parquet File #444
- Add IO read for Avro #401
- Added support to read Avro logical types,
List
,Enum
,Duration
andFixed
. #493 (jorgecarleitao) - Added read
Decimal
from parquet #489 (potter420) - Implement
BitXor
trait forBitmap
#485 (houqp) - Added
extend
/extend_unchecked
forMutableBooleanArray
#478 (VasanthakumarV) - expose
shrink_to_fit
to mutable arrays #467 (ritchie46) - Added support for
DataType::Map
andMapArray
#464 (jorgecarleitao) - Extract parts of datetime #433 (VasanthakumarV)
- Added support to add an interval to a timestamp #417 (jorgecarleitao)
- Added support to read Avro. #406 (jorgecarleitao)
- Replaced own allocator by
std::Vec
. #385 (jorgecarleitao)
Fixed bugs:
- crash in parquet read #459
- Made writing stream to parquet require a non-static lifetime #471 (GrandChaman)
- Made importing from FFI
unsafe
#458 (jorgecarleitao) - Fixed panic in division using nulls. #438 (jorgecarleitao)
- Fixed error writing dictionary extension to IPC #397 (jorgecarleitao)
- Fixed error in extending
MutableBitmap
#393 (jorgecarleitao)
Enhancements:
- Some
compare
function are not exported #349 - Investigate how to add support for timezones in timestamp #23
- Made
hash
work for extension type #487 (jorgecarleitao) - Added
extend
/extend_unchecked
forMutableBinaryArray
#486 (VasanthakumarV) - Improved inference and deserialization of CSV #483 (jorgecarleitao)
- Added
GrowableFixedSizeList
and improvedMutableFixedSizeListArray
#470 (jorgecarleitao) - Added
MutableBitmap::shrink_to_fit
#468 (jorgecarleitao) - Added
MutableArray::as_box
#450 (sd2k) - Improved performance of sum aggregation via aligned loads (-10%) #445 (ritchie46)
- Removed
assert
fromMutableBuffer::set_len
#443 (ritchie46) - Optimized
null_count
#442 (ritchie46) - Improved performance of list iterator (- 10-20%) #441 (ritchie46)
- Improved performance of
PrimitiveGrowable
for nulls (-10%) #434 (jorgecarleitao) - Allowed accessing validity without importing
Array
#432 (jorgecarleitao) - Optimize hashing using
ahash
andmultiversion
(-30%) #428 (Dandandan) - Improved performance of iterator of
Utf8Array
andBinaryArray
(3-4x) #427 (jorgecarleitao) - Improved performance of utf8 validation of large strings via
simdutf8
(-40%) #426 (Dandandan) - Added reading of parquet required dictionary-encoded binary. #419 (jorgecarleitao)
- Add
extend
/extend_unchecked
forMutableUtf8Array
#413 (VasanthakumarV) - Added support to extract hours and years from timestamps with timezone #412 (jorgecarleitao)
- Added
io_csv_read
andio_csv_write
feature #408 (ritchie46) - Improve
comparison
docs and re-export the array-comparing function #404 (HagaiHargil) - Added support to read dict-encoded required primitive types from parquet #402 (Dandandan)
- Added
Array::with_validity
#399 (ritchie46)
Documentation updates:
- Improved documentation #491 (jorgecarleitao)
- Added more API docs. #479 (jorgecarleitao)
- Added more documentation #476 (jorgecarleitao)
- Improved documentation #462 (jorgecarleitao)
- Added example showing parallel writes to parquet (x num_cores) #436 (jorgecarleitao)
- Improved documentation #430 (jorgecarleitao)
- [0.5] The docs
io
module has no submodules #390 - Made docs be compiled with feature
full
#391 (jorgecarleitao)
Testing updates:
- DRY via macro. #477 (jorgecarleitao)
- DRY of type check and len check code in `comp...
v0.5.3
A new release is here, containing bug fixes and backward-compatible enhancements.
Thank you to all involved in the testing and development that resulted in this version!
New features:
- Added support to read and write extension types to and from parquet #396 (jorgecarleitao)
Fixed bugs:
- Fixed error writing dictionary extension to IPC #397 (jorgecarleitao)
- Fixed error in extending
MutableBitmap
#393 (jorgecarleitao)
Enhancements:
- Added support to read dict-encoded required primitive types from parquet #402 (Dandandan)
- Added
Array::with_validity
#399 (ritchie46)
Testing updates:
- Fix testing of SIMD #394 (jorgecarleitao)
v0.5.2
Hot fix release to make the API docs contain all optional features.
Documentation updates:
- [0.5] The docs
io
module has no submodules #390 - Made docs be compiled with feature
full
#391 (jorgecarleitao)
v0.5.0
A new release is here! 🎉🎉🎉
This one marked by further alignment with the arrow specification. Of special mention:
- ✅ Added full support for
async
parquet write (by @GrandChaman) - ✅ Added fast
extend_*values
toMutablePrimitiveArray
(by @ritchie46) - ✅ Added support for compute to
BinaryArray
(by @zhyass) - ✅ Added support to extension types (IPC, FFI, etc.) (by @jorgecarleitao)
- ✅ Added support for the brand new
MONTH_DAY_NANO
interval type (by @jorgecarleitao) - 🚀 Improved performance of the calculation of null counts by 5x (by @jorgecarleitao)
- 🔧 Made
cargo
features not default (by @jorgecarleitao)
As usual, there is a small number of backward incompatible changes. See associated issues below, which include the migration paths to each of them.
Breaking changes:
- Added
Extension
toDataType
#361 MonthDayNano
added to enumIntervalUnit
#360- Make
io::parquet::write::write_*
return size of file in bytes #354 - Renamed
bitmap::utils::null_count
tobitmap::utils::count_zeros
#342 - Made
GroupFilter
optional in parquet'sRecordReader
and added method to set it. #386 (jorgecarleitao) - Removed
PartialOrd
andOrd
of all enums indatatypes
#379 (jorgecarleitao) - Made
cargo
features not default #369 (jorgecarleitao) - Prepare APIs for extension types #357 (jorgecarleitao)
New features:
- Added support for
async
parquet write #372 (GrandChaman) - Add support to extension types in FFI #363 (jorgecarleitao)
- Added support for field's metadata via FFI #362 (jorgecarleitao)
- Added support for
Extension
(logical) type #359 (jorgecarleitao) - Added support for compute to
BinaryArray
#346 (zhyass) - Added support for reading binary from CSV #337 (jorgecarleitao)
- Added support for
MONTH_DAY_NANO
interval type #268 (jorgecarleitao)
Fixed bugs:
- Parquet read skips a few rows at the end of the page #373
parquet_read
fails when a column has too many rows with string values #366parquet_read
panics withindex_out_of_bounds
#351- Fixed error in
MutableBitmap::push_unchecked
#384 (jorgecarleitao) - Fixed display of timestamp with tz. #375 (jorgecarleitao)
Enhancements:
- Added
extend_*values
toMutablePrimitiveArray
#383 (ritchie46) - Improved performance of writing to CSV (20-25%) #382 (jorgecarleitao)
- Bumped
lexical-core
#378 (jorgecarleitao) - Fixed casting of utf8 <> Timestamp with and without timezone #376 (jorgecarleitao)
- Added
Send+Sync
toMutableBuffer
#368 (jorgecarleitao) - Improved performance of unary _not_ for aligned bitmaps (3x) #365 (jorgecarleitao)
- Reduced dependencies within
num
#353 (jorgecarleitao) - Bumped to parquet2 v0.4 #352 (jorgecarleitao)
- Bumped tonic and prost in flight #344 (PsiACE)
- Improved null count calculation (5x) #343 (jorgecarleitao)
- Improved perf of deserializing integers from json (30%) #340 (jorgecarleitao)
- Simplified code of json schema inference #339 (jorgecarleitao)
Documentation updates:
- Moved guide examples to examples/ #387 (jorgecarleitao)
- Added more docs. #358 (jorgecarleitao)
- Improved API docs. #355 (jorgecarleitao)
Testing updates:
- Moved tests to
tests/
#389 (jorgecarleitao) - Moved compute tests to tests/ #388 (jorgecarleitao)
- Added more tests. #380 (jorgecarleitao)
- Pinned nightly in SIMD tests #364 (jorgecarleitao)
- Improved benches for take #348 (jorgecarleitao)
- Made IPC integration tests run tests that are not run by arrow-rs #278 (jorgecarleitao)
v0.4.0
A new release is here! 🎉🎉🎉
This one marked by a lot of enhancements to existing functionality. Of special mention:
- 🚀 improved performance of integer division by 4x-10x via strength division (@sundy-li and @ritchie46)
- 🚀 improved performance of concatenating nullable arrays by 4x
- 🚀 improved performance of comparisons by 2x-14x
- 🔧 moved most tests to a separate directory
- 🔧 Increased test coverage to over 80%
- 🔧 Made
multiversion
,lexical-core
andserde-derive
dependencies optional - ✅ Added support for
UnionArray
(including FFI and IPC tests) - ✅ Added support for FFI of
Field
(full list below)
As usual, there is a small number of backward incompatible changes. The associated issues include the migration paths.
Finally, thank you to all contributors and reporters 🙇 In particular, thank you to polars and datafuse teams for the 🐛 reports. They help tremendously 💯
Breaking changes:
- Change dictionary iterator of values from
Array
s of one element toScalar
s #335 - Align FFI API with arrow's C++ API #328
- Make
*_compare_scalar
not returnResult
#316 - Make
io::print
,get_value_display
andget_display
not returnResult
#286 - Add
MetadataVersion
to IPC interfaces #282 - Change
DataType::Union
to enable round trips in IPC #281 - Removed clone requirement in
StructArray -> RecordBatch
#307 (jorgecarleitao) - Fixed error in reading a non-finished IPC stream. #302 (jorgecarleitao)
- Generalized ZipIterator to accept a
BitmapIter
#296 (jorgecarleitao)
New features:
- Added API to FFI
Field
#321 (jorgecarleitao) - Added
compare_scalar
#317 (jorgecarleitao) - Add
UnionArray
#283 (jorgecarleitao)
Fixed bugs:
- SliceIterator of last bytes is not correct #292
- Fixed error in displaying dictionaries with nulls in values #334 (jorgecarleitao)
- Fixed error in dict equality #333 (jorgecarleitao)
- Fixed small inconsistencies between
compute::cast
andcompute::can_cast
#295 (jorgecarleitao) - Removed order implementation for
days_ms
/Interval(DayTime)
#285 (jorgecarleitao)
Enhancements:
- Added support for remaining non-nested datatypes #336 (jorgecarleitao)
- Made
multiversion
andlexical-core
optional #324 (jorgecarleitao) - Improved performance of utf8 comparison (1.7x-4x) #322 (jorgecarleitao)
- Improved performance of boolean comparison (5x-14x) #318 (jorgecarleitao)
- Added trait
TryPush
#314 (jorgecarleitao) - Added cast
date32 -> i64
anddate64 -> i32
#308 (ritchie46) - Improved performance of comparison with SIMD feature flag (2x-3.5x) #305 (jorgecarleitao)
- Added support to read json to
BinaryArray
#304 (jorgecarleitao) - Improved
MutableFixedSizeBinaryArray
#303 (jorgecarleitao) - Improved
MutablePrimitiveArray
andMutableUtf8Array
#299 (jorgecarleitao) - Improved
MutableBooleanArray
#297 (jorgecarleitao) - Improved performance of concatenating non-aligned validities (15x) #291 (jorgecarleitao)
- Added support for timestamps with tz and interval to
io::print::write
#287 (jorgecarleitao) - Improved debug repr of buffers and bitmaps. #284 (jorgecarleitao)
- Cleaned up internals of json integration #280 (jorgecarleitao)
- Removed
serde_derive
dependency #279 (jorgecarleitao) - Simplified IPC code. #277 (jorgecarleitao)
- Reduced dependencies from confi-table and enabled
wasm
onio_print
feature. #276 (jorgecarleitao) - Improve performance of
rem_scalar/div_scalar
for integer types (4x-10x) #275 (ritchie46)
Documentation updates:
- Cleaned examples and docs from old API. #330 (jorgecarleitao)
- Improved documentation #306 (jorgecarleitao)
Testing updates:
- Improved naming of testing workflows #315 (jorgecarleitao)
- Added tests to scalar API #300 (jorgecarleitao)
- Made CSV and JSON tests not use files. #290 (jorgecarleitao)
- Moved tests to integration tests #289 (jorgecarleitao)
Closed issues: