This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

12 Mar 21:02

v0.10.0

1431b96

v0.10.0

Arrow2 0.10.0 is out! 🚀🚀🚀🚀🚀

Continuing breaking ground, this constitutes one of the most feature rich releases of this crate so far!

Thank you to everyone for the impressive work over the past 2.5 months that make arrow2 so feature rich, safe, fast, and easy to use! 🙇

Here are the main headlines:

Copy on Write

So far, whenever we applied a transformation to an array, we had to create a new array. When multiple operations were used (e.g. c1 x 2 + 1), it lead to the following compute pattern:

1. allocate new region
2. compute
3. allocate new region
4. compute

This was identified by @sundy-li on #741 and addressed by @ritchie46 on #794.

Users can now re-use Arced arrays, just like std::sync::Arc::get_mut. As expected, if the array is being used in multiple places, it will return a None and users do need to allocate a new region (exclusive mutability).

This is being used in Polars to further re-use allocated regions and therefore reduce both memory pressure and wasted compute cycles allocating new regions.

Support for ODBC

This release now supports reading from, and write to, any ODBC driver.

This builds on top of the superb odbc-api created by @pacman82, that allows this crate to use the columnar format provided by ODBC specification.

Given a performant ODBC driver, this is expected to be the fastest way to load data to the Arrow format, as many operations are simple memcopies.

Check out the example and guide for details on how to use it!

`async` support for writing to Arrow's IPC

Until now, we had limited support to writing to Arrow IPC asynchronously. @dexterduck closed this gap on #878, offering complete async support for both Arrow files and Arrow streams, including implementations of futures::Stream and futures::Sink for them!

Migrated `std::simd`

After some back and forth with the working group of the project portable simd, this release replaces packed_simd2 by std::simd. This resulted in no performance difference but allow us to leverage the great work that is happening on std::simd.

Support to Serde metadata

A common pain point in using arrow2's logical types is that they are quite rich, making them sometimes difficult
to visualize or represent in e.g. JSON. @houqp closed this with #858, that adds compatibility with Serde for
schema-related structs in this crate (PhysicalType DataType, Field, Schema).

Support for Arrow C stream interface

Arrow has an experimental specification for an FFI to iterators of arrow arrays. This release now fully supports this interface.

Made crate `deny(missing_docs)`

This makes us developers more conscious about documenting APIs, thereby allowing users more context about them. We have also start documenting IO-related APIs over whether they are CPU or IO-bounded, so that users know which ones block async contexts.

Changelog

Full Changelog

Breaking changes:

Renamed Ffi_ArrowArray and Ffi_ArrowSchema #859
Improved performance and stability of writing to CSV #866 (ritchie46)
Simplified API for writing to JSON #864 (jorgecarleitao)
Simplified API to import from FFI #854 (jorgecarleitao)
Simplified compute (lower/upper) #847 (jorgecarleitao)
Simplified infering arrow schema from a parquet schema #819 (jorgecarleitao)
Bumped parquet and aligned API to fit into it #795 (jorgecarleitao)

New features:

Added GrowableUnion #902 (jorgecarleitao)
Added cast to months_days_ns #900 (jorgecarleitao)
Added support for hash of month_day_ns arrays #899 (jorgecarleitao)
IPC sink types and IPC file stream #878 (dexterduck)
implemented futures::Sink for parquet async writer #877 (dexterduck)
Added try_new and new to all arrays #873 (jorgecarleitao)
Added support for datatypes serde #858 (houqp)
Added support to the Arrow C stream interface (read and write) #857 (jorgecarleitao)
Support to read/write from/to ODBC #849 (jorgecarleitao)
Added operators that include validities in comparisons #846 (ritchie46)
Added support to read and write Decimal128 to Avro #837 (potter420)
Added support to read Arrow streams asynchronously #832 (jorgecarleitao)
Added support to write LargeUtf8 and LargeBinary to Avro #828 (illumination-k)
Added support for pushdown projection in reading Avro #827 (jorgecarleitao)
Added support to read Avro's structs #826 (jorgecarleitao)
Added support to write largeUtf8/Binary to Avro #825 (jorgecarleitao)
Added json serialization of timestamp/date32/date64 #814 (ritchie46)
Added BooleanArray::from_trusted_len_values_iter_unchecked #799 (ritchie46)
Added MutableUtf8Array::extend_values #798 (ritchie46)
Added COW semantics to Buffer, Bitmap and some arrays #794 (ritchie46)
Added support to read parquet row groups in chunks #789 (jorgecarleitao)
Added scalar bitwise ops #788 (jorgecarleitao)
Migrated to portable simd #747 (jorgecarleitao)

Fixed bugs:

Fixed edge case in reading multiple parquet pages #904 (jorgecarleitao)
Bug fix in offset for sliced unions #891 (ncpenke)
Fix edge case in reading nested parquet #884 (jorgecarleitao)
Fixed unsoundness of #derive(Clone) for FFI structs #882 (jorgecarleitao)
Fixed json writing of dates and datetimes #867 (jorgecarleitao)
Fixed reading parquet with timezone #862 (jorgecarleitao)
Fixed error in writing compressed IPC arrow #855 (jorgecarleitao)
Fixed wrong null_count when slicing a sliced Bitmap #848 (satlank)
Fixed error in writing compressed IPC files #840 (jorgecarleitao)
Fixed float to i128 cast #817 (houqp)
fix unescaped '"' in json writing #812 (ritchie46)
Fixed reading parquet binary dict page #791 (danburkert)

Enhancements:

Add FixedSizeBinaryScalar #782
Use more idiomatic versions #898 (jorgecarleitao)
Added support for min/max for decimal [#897](https://github...

Contributors

houqp, ritchie46, and 3 other contributors

Assets 2

14 Jan 22:22

jorgecarleitao

v0.9.0

72f8363

v0.9.0

A new release is here! 🎉🎉🎉🎉 This release has four major improvements:

It is now backed by std's Vec, thus making it
- zero-copy with the rest of Rust's ecosystem
- use less unsafe
- more ergonomics
- faster to compile
- (no difference in performance)
It now supports reading from, and writing to, Apache Avro, both sync and async
flatbuffers dependency was replaced by planus, a re-implementation of the flatbuffers specification in Rust (you should check out that project, awesome work by @kristoff3r and @TethysSvensson)
- lower risks of unsound
- easier-to-maintain code base
Improved security and general maintenance:
- Made most of the crate #[forbid(unsafe)]
- significantly reduced the use of unsafe via bytemuck's dependency
- made most of parsing of Arrow IPC panic-free, to reduce risks of DOS from untrusted data

A big thanks to all contributors (listed below) and our users for all the dedication, hard work, and patience. 🙇

Breaking changes:

Added number of rows read in CSV inference #765 (jorgecarleitao)
Refactored nullif #753 (jorgecarleitao)
Migrated to latest parquet2 #752 (jorgecarleitao)
Replace flatbuffers dependency by Planus #732 (jorgecarleitao)
Simplified Schema and Field #728 (jorgecarleitao)
Replaced RecordBatch by Chunk #717 (jorgecarleitao)
Removed Option from fields' metadata #715 (jorgecarleitao)
Moved dict_id to IPC-specific IO #713 (jorgecarleitao)
Moved is_ordered from Field to DataType::Dictionary #711 (jorgecarleitao)
Refactored JSON writing (5-10x) #709 (jorgecarleitao)
Made Avro read API use Block and CompressedBlock #698 (jorgecarleitao)
Simplified most traits #696 (jorgecarleitao)
Replaced Display by Debug for Array #694 (jorgecarleitao)
Replaced MutableBuffer by std::Vec #693 (jorgecarleitao)
Simplified Utf8Scalar and BinaryScalar #660 (jorgecarleitao)
Simplified Primitive and Boolean scalar #648 (jorgecarleitao)

New features:

Add and_scalar and or_scalar for boolean_kleene #662
Add lower and upper support for string #635
Added support to cast decimal #761 (jorgecarleitao)
Added support to deserialize JSON (!= NDJSON) #758 (jorgecarleitao)
Added support to infer nested json structs #750 (jorgecarleitao)
Added support to compare intervals #746 (jorgecarleitao)
Added any and all kernel #739 (ritchie46)
Added support to write Avro async #736 (jorgecarleitao)
Added support to write interval to Avro #734 (jorgecarleitao)
Added and_scalar and or_scalar for boolean kleene #723 (silathdiir)
Added and_scalar and or_scalar for boolean #707 (silathdiir)
Refactored JSON read to split IO-bounded from CPU-bounded tasks #706 (jorgecarleitao)
Added more conversions from parquet #701 (jorgecarleitao)
Added support for compressed Avro write #699 (jorgecarleitao)
Added support to write to Avro #690 (jorgecarleitao)
Added dynamic version of negation #685 (jorgecarleitao)
Added support to read dictionary-encoded required parquet pages #683 (mdrach)
Added upper #664 (Xuanwo)
Added lower #641 (Xuanwo)
Added support for async read of Avro #620 (jorgecarleitao)

Fixed bugs:

Pyarrow and Arrow2 don't agree on Timestamp resolution #700
Writing compressed dictionary in parquet corrupts the files #667
Replaced assert by error in IPC read #748 (jorgecarleitao)
Made all panics in IPC read errors #722 (jorgecarleitao)
Fixed error in compare booleans #721 (jorgecarleitao)
Fixed error in dispatching scalar arithmetics #682 (jorgecarleitao)
Fixed error in reading negative decimals from parquet #679 (mdrach)
Made IPC reader less restrictive #678 (jorgecarleitao)
Fixed error in trait constraint in compute #665 (jorgecarleitao)
Fixed performance regression of CSV reading #657 (jorgecarleitao)
Fixed filter of predicate with validity #653 (ritchie46)
Made Scalar: Send+Sync #644 (jorgecarleitao)

Enhancements:

Feature: JSON IO? #712
Simplified code #760 (jorgecarleitao)
Added iterator of values of FixedBinaryArray #757 (jorgecarleitao)
Remove un-needed unsafe #756 (jorgecarleitao)
Replaced un-needed unsafe #755 (jorgecarleitao)
Made IO #[forbid(unsafe)] #749 (jorgecarleitao)
Improved reading nullable Avro arrays #727 (Igosuki)
Allow to create primitive array by vec without extra memcopy #710 (sundy-li)
Removed requirement of use Array to access primitives' data_type #697 (jorgecarleitao)
Cleaned up trait usage and added forbid_unsafe to parts #695 (jorgecarleitao)
Migrated from avro-rs to avro-schema #692 (jorgecarleitao)
Added MutablePrimitiveArray::extend_constant [#6...

Contributors

kristoff3r and TethysSvensson

Assets 2

27 Nov 06:20

jorgecarleitao

v0.8.0

b853d95

v0.8.0

A new release is here 🚀🚀🚀

This release has so many important new features and bug fixes that will be summarized as: thank you everyone for all the issues and PRs that resulted in this release (in order of appearance) 🙇🙇🙇🙇:

Full Changelog

Breaking changes:

Made CSV write options use chrono formatting by default #624
Add compression to IpcWriteOptions #570
Made cast accept CastOptions parameter #569
Simplified ArrowError #640 (jorgecarleitao)
Use DynComparator for lexsort and partition #637 (yjshen)
Split "compute" feature #634 (jorgecarleitao)
Removed unneeded trait. #628 (jorgecarleitao)
Sealed 2 traits to forbid downstream implementations #621 (jorgecarleitao)
Simplified arithmetics compute #607 (jorgecarleitao)
Refactored comparison Operator #604 (jorgecarleitao)
Simplified dictionary indexes #584 (jorgecarleitao)
Simplified IPC APIs #576 (jorgecarleitao)
Simplified IPC stream writer / remove finish on drop from stream writer #575 (jorgecarleitao)
Simplified trait in compute. #572 (jorgecarleitao)
Compute: add partial option into CastOptions #561 (sundy-li)
Introduced UnionMode enum #557 (simonvandel)
Changed DataType::FixedSize*(i32) to DataType::FixedSize*(usize) #556 (simonvandel)

New features:

Added support to write timestamps with timezones for CSV #623 (jorgecarleitao)
Added support to read Avro files' metadata asynchronously #614 (jorgecarleitao)
Added iterator for StructArray #613 (illumination-k)
Added support to read snappy-compressed Avro #612 (jorgecarleitao)
Added support to read decimal from csv #602 (jorgecarleitao)
Added support to cast NullArray to all other types #589 (flaneur2020)
Added support dictionaries in nested types over IPC #587 (jorgecarleitao)
Added support to write Arrow IPC streams asynchronously #577 (jorgecarleitao)
Added support to write compressed Arrow IPC (feather v2) #566 (jorgecarleitao)
Added support for ffi for FixedSizeList and FixedSizeBinary #565 (jorgecarleitao)
Added support for async csv reading. #562 (jorgecarleitao)
Added support for bitwise operations #553 (1aguna)
Added support to read StructArray from parquet #547 (jorgecarleitao)

Fixed bugs:

Fixed error in reading nullable from Avro. #631 (jorgecarleitao)
Fixed error in union FFI #625 (jorgecarleitao)
Fixed error in computing projection in io::ipc::read::reader::FileReader #596 (illumination-k)
Fixed error in compressing IPC LZ4 #593 (jorgecarleitao)
Fixed growable of dictionaries negative keys #582 (ritchie46)
Made substring kernel on utf8 take chars into account. #568 (ritchie46)
Fixed error in passing sliced arrays via FFI #564 (jorgecarleitao)

Enhancements:

Faster take with null values (2-3x) #633 (jorgecarleitao)
Improved error message for missing feature in compressed parquet #632 (jorgecarleitao)
Added to conversion to FixedSizeBinary #622 (ritchie46)
Bumped confy-table #618 (jorgecarleitao)
Made MutableArray Send + Sync #617 (jorgecarleitao)
Removed most of allocations in IPC reading #611 (jorgecarleitao)
Speed up boolean comparison kernels (~3x) #610 (Dandandan)
Improved performance of decimal arithmetics #605 (jorgecarleitao)
Simplified traits and added documentation #603 (jorgecarleitao)
Improved performance of is_not_null. #600 (jorgecarleitao)
Added len to every array #599 (jorgecarleitao)
Added support for NullArray at FFI. #598 (jorgecarleitao)
Optimized MutableBinaryArray #597 (jorgecarleitao)
Speedup/simplify bitwise operations (avoid extra allocation) #586 (Dandandan)
Improved performance of bitmap::from_trusted (3x) #578 (jorgecarleitao)
Made bitmap not cache null count #563 (jorgecarleitao)
Avoided redundant checks in creating an Utf8Array from MutableUtf8Array #560 (jorgecarleitao)
Avoid unnecessary allocations #559 (simonvandel)
Surfaced errors in reading from avro #558 (jorgecarleitao)

Documentation updates:

Simplified example #619 (jorgecarleitao)
Made example of parallel parquet write be over multiple batches #544 (jorgecarleitao)

Testing updates:

Cleaned up benches #636 (jorgecarleitao)
Ignor...

Assets 2

29 Oct 19:24

jorgecarleitao

v0.7.0

5fc843d

v0.7.0

Another release is here 🚀🚀🚀

As usual, a bunch of optimizations as well as some work in two main fronts:

make the crate smaller and easier to compile
support for nested parquet reads

Thank you to all contributors (names below) for the amazing contributions!

Breaking changes:

Simplified reading parquet #532 (jorgecarleitao)
Change IPC FileReader to own the underlying reader #518 (blakesmith)
Migrate to arrow_format crate #517 (jorgecarleitao)

New features:

Added read of 2-level nested lists from parquet #548 (jorgecarleitao)
add dictionary serialization for csv-writer #515 (ritchie46)
Added checked_negate and wrapping_negate for PrimitiveArray #506 (yjhmelody)

Fixed bugs:

Fixed error in reading fixed len binary from parquet #549 (jorgecarleitao)
Fixed ffi of sliced arrays #540 (jorgecarleitao)
Fixed s3 example #536 (jorgecarleitao)
Fixed error in writing compressed parquet dict pages #523 (jorgecarleitao)
Validity taken into account when writing StructArray to json #511 (VasanthakumarV)

Enhancements:

Bumped Prost and Tonic #550 (PsiACE)
Speedup scalar boolean operations #546 (Dandandan)
Added fast path for validating ASCII text (~1.12-1.89x improvement on reading ASCII parquet data) #542 (Dandandan)
Exposed missing APIs to write parquet in parallel #539 (jorgecarleitao)
improve utf8 init validity #530 (ritchie46)
export missing BinaryValueIter #526 (yjhmelody)

Documentation updates:

Added more IPC documentation #534 (HagaiHargil)
Fixed clippy and fmt #521 (ritchie46)

Testing updates:

Added more tests for utf8 #543 (jorgecarleitao)
Ignored RUSTSEC-2020-0071 and RUSTSEC-2020-0159 #537 (jorgecarleitao)
Improved parquet read benches #533 (jorgecarleitao)
Added fmt and clippy checks to CI. #522 (xudong963)

Assets 2

09 Oct 03:51

jorgecarleitao

v0.6.2

b19dde2

v0.6.2

Small release with two minor but relevant bug fixes and a new feature.

Full Changelog

New features:

Added wrapping version arithmetics for PrimitiveArray #496 (yjhmelody)

Fixed bugs:

Do not check offsets or utf8 validity in ffi (#505) #510 (NilsBarlaug)
Made try_push_valid public again #509 (ritchie46)

Enhancements:

Use static-typed equal functions directly #507 (yjhmelody)

Assets 2

07 Oct 23:11

jorgecarleitao

v0.6.1

4bb3237

v0.6.0

(in crates as 0.6.1: I made a mistake in publishing). Anyways, another big release is here!

There are just too many improvements for a 22 days release - let's try to capture important mentions:

Buffer and MutableBuffer are now compatible with Rust's std::Vec with no strings attached: everything continues to work, including FFI with the rest of the ecosystem! You can recover the previous behavior (of using cached-aligned allocations), via feature cache_aligned
Added broad support to timestamp with timezones. Kudos to @VasanthakumarV for all the help.
Added read Decimal from parquet. Kudos to @potter420 for the contribution.
More improvements to performance. Kudos to @Dandandan and @ritchie46.
Support to read from the Avro via feature io_avro

Full Changelog

Breaking changes:

Bring MutableFixedSizeListArray to the spec used by the rest of the Mutable API #475
Removed ALIGNMENT invariant from [Mutable]Buffer #449
Un-nested compute::arithemtics::basic #461 (jorgecarleitao)
Added more serialization options for csv writer. #453 (ritchie46)
Changed validity from &Option<Bitmap> to Option<&Bitmap>. #431 (jorgecarleitao)
Bumped parquet2 #422 (jorgecarleitao)
Changed IPC FileWriter to own the writer. #420 (yjshen)
Made DynComparator Send+Sync #414 (yjshen)

New features:

Read Decimal from Parquet File #444
Add IO read for Avro #401
Added support to read Avro logical types, List,Enum, Duration and Fixed. #493 (jorgecarleitao)
Added read Decimal from parquet #489 (potter420)
Implement BitXor trait for Bitmap #485 (houqp)
Added extend/extend_unchecked for MutableBooleanArray #478 (VasanthakumarV)
expose shrink_to_fit to mutable arrays #467 (ritchie46)
Added support for DataType::Map and MapArray #464 (jorgecarleitao)
Extract parts of datetime #433 (VasanthakumarV)
Added support to add an interval to a timestamp #417 (jorgecarleitao)
Added support to read Avro. #406 (jorgecarleitao)
Replaced own allocator by std::Vec. #385 (jorgecarleitao)

Fixed bugs:

crash in parquet read #459
Made writing stream to parquet require a non-static lifetime #471 (GrandChaman)
Made importing from FFI unsafe #458 (jorgecarleitao)
Fixed panic in division using nulls. #438 (jorgecarleitao)
Fixed error writing dictionary extension to IPC #397 (jorgecarleitao)
Fixed error in extending MutableBitmap #393 (jorgecarleitao)

Enhancements:

Some compare function are not exported #349
Investigate how to add support for timezones in timestamp #23
Made hash work for extension type #487 (jorgecarleitao)
Added extend/extend_unchecked for MutableBinaryArray #486 (VasanthakumarV)
Improved inference and deserialization of CSV #483 (jorgecarleitao)
Added GrowableFixedSizeList and improved MutableFixedSizeListArray #470 (jorgecarleitao)
Added MutableBitmap::shrink_to_fit #468 (jorgecarleitao)
Added MutableArray::as_box #450 (sd2k)
Improved performance of sum aggregation via aligned loads (-10%) #445 (ritchie46)
Removed assert from MutableBuffer::set_len #443 (ritchie46)
Optimized null_count #442 (ritchie46)
Improved performance of list iterator (- 10-20%) #441 (ritchie46)
Improved performance of PrimitiveGrowable for nulls (-10%) #434 (jorgecarleitao)
Allowed accessing validity without importing Array #432 (jorgecarleitao)
Optimize hashing using ahash and multiversion (-30%) #428 (Dandandan)
Improved performance of iterator of Utf8Array and BinaryArray (3-4x) #427 (jorgecarleitao)
Improved performance of utf8 validation of large strings via simdutf8 (-40%) #426 (Dandandan)
Added reading of parquet required dictionary-encoded binary. #419 (jorgecarleitao)
Add extend/extend_unchecked for MutableUtf8Array #413 (VasanthakumarV)
Added support to extract hours and years from timestamps with timezone #412 (jorgecarleitao)
Added io_csv_read and io_csv_write feature #408 (ritchie46)
Improve comparison docs and re-export the array-comparing function #404 (HagaiHargil)
Added support to read dict-encoded required primitive types from parquet #402 (Dandandan)
Added Array::with_validity #399 (ritchie46)

Documentation updates:

Improved documentation #491 (jorgecarleitao)
Added more API docs. #479 (jorgecarleitao)
Added more documentation #476 (jorgecarleitao)
Improved documentation #462 (jorgecarleitao)
Added example showing parallel writes to parquet (x num_cores) #436 (jorgecarleitao)
Improved documentation #430 (jorgecarleitao)
[0.5] The docs io module has no submodules #390
Made docs be compiled with feature full #391 (jorgecarleitao)

Testing updates:

DRY via macro. #477 (jorgecarleitao)
DRY of type check and len check code in `comp...

Contributors

Dandandan, ritchie46, and 2 other contributors

Assets 2

14 Sep 05:51

jorgecarleitao

v0.5.3

06892e9

v0.5.3

A new release is here, containing bug fixes and backward-compatible enhancements.

Thank you to all involved in the testing and development that resulted in this version!

Full Changelog

New features:

Added support to read and write extension types to and from parquet #396 (jorgecarleitao)

Fixed bugs:

Fixed error writing dictionary extension to IPC #397 (jorgecarleitao)
Fixed error in extending MutableBitmap #393 (jorgecarleitao)

Enhancements:

Added support to read dict-encoded required primitive types from parquet #402 (Dandandan)
Added Array::with_validity #399 (ritchie46)

Testing updates:

Fix testing of SIMD #394 (jorgecarleitao)

Assets 2

09 Sep 21:00

jorgecarleitao

v0.5.2

a238efe

v0.5.2

Hot fix release to make the API docs contain all optional features.

Full Changelog

Documentation updates:

[0.5] The docs io module has no submodules #390
Made docs be compiled with feature full #391 (jorgecarleitao)

Assets 2

08 Sep 17:31

jorgecarleitao

v0.5.0

abe0e88

v0.5.0

A new release is here! 🎉🎉🎉

This one marked by further alignment with the arrow specification. Of special mention:

✅ Added full support for async parquet write (by @GrandChaman)
✅ Added fast extend_*values to MutablePrimitiveArray (by @ritchie46)
✅ Added support for compute to BinaryArray(by @zhyass)
✅ Added support to extension types (IPC, FFI, etc.) (by @jorgecarleitao)
✅ Added support for the brand new MONTH_DAY_NANO interval type (by @jorgecarleitao)
🚀 Improved performance of the calculation of null counts by 5x (by @jorgecarleitao)
🔧 Made cargo features not default (by @jorgecarleitao)

As usual, there is a small number of backward incompatible changes. See associated issues below, which include the migration paths to each of them.

Full Changelog

Breaking changes:

Added Extension to DataType #361
MonthDayNano added to enum IntervalUnit #360
Make io::parquet::write::write_* return size of file in bytes #354
Renamed bitmap::utils::null_count to bitmap::utils::count_zeros #342
Made GroupFilter optional in parquet'sRecordReader and added method to set it. #386 (jorgecarleitao)
Removed PartialOrd and Ord of all enums in datatypes #379 (jorgecarleitao)
Made cargo features not default #369 (jorgecarleitao)
Prepare APIs for extension types #357 (jorgecarleitao)

New features:

Added support for async parquet write #372 (GrandChaman)
Add support to extension types in FFI #363 (jorgecarleitao)
Added support for field's metadata via FFI #362 (jorgecarleitao)
Added support for Extension (logical) type #359 (jorgecarleitao)
Added support for compute to BinaryArray #346 (zhyass)
Added support for reading binary from CSV #337 (jorgecarleitao)
Added support for MONTH_DAY_NANO interval type #268 (jorgecarleitao)

Fixed bugs:

Parquet read skips a few rows at the end of the page #373
parquet_read fails when a column has too many rows with string values #366
parquet_read panics with index_out_of_bounds #351
Fixed error in MutableBitmap::push_unchecked #384 (jorgecarleitao)
Fixed display of timestamp with tz. #375 (jorgecarleitao)

Enhancements:

Added extend_*values to MutablePrimitiveArray #383 (ritchie46)
Improved performance of writing to CSV (20-25%) #382 (jorgecarleitao)
Bumped lexical-core #378 (jorgecarleitao)
Fixed casting of utf8 <> Timestamp with and without timezone #376 (jorgecarleitao)
Added Send+Sync to MutableBuffer #368 (jorgecarleitao)
Improved performance of unary _not_ for aligned bitmaps (3x) #365 (jorgecarleitao)
Reduced dependencies within num #353 (jorgecarleitao)
Bumped to parquet2 v0.4 #352 (jorgecarleitao)
Bumped tonic and prost in flight #344 (PsiACE)
Improved null count calculation (5x) #343 (jorgecarleitao)
Improved perf of deserializing integers from json (30%) #340 (jorgecarleitao)
Simplified code of json schema inference #339 (jorgecarleitao)

Documentation updates:

Moved guide examples to examples/ #387 (jorgecarleitao)
Added more docs. #358 (jorgecarleitao)
Improved API docs. #355 (jorgecarleitao)

Testing updates:

Moved tests to tests/ #389 (jorgecarleitao)
Moved compute tests to tests/ #388 (jorgecarleitao)
Added more tests. #380 (jorgecarleitao)
Pinned nightly in SIMD tests #364 (jorgecarleitao)
Improved benches for take #348 (jorgecarleitao)
Made IPC integration tests run tests that are not run by arrow-rs #278 (jorgecarleitao)

Contributors

jorgecarleitao, ritchie46, and 2 other contributors

Assets 2

24 Aug 21:47

jorgecarleitao

v0.4.0

f79ae3e

v0.4.0

A new release is here! 🎉🎉🎉

This one marked by a lot of enhancements to existing functionality. Of special mention:

🚀 improved performance of integer division by 4x-10x via strength division (@sundy-li and @ritchie46)
🚀 improved performance of concatenating nullable arrays by 4x
🚀 improved performance of comparisons by 2x-14x
🔧 moved most tests to a separate directory
🔧 Increased test coverage to over 80%
🔧 Made multiversion, lexical-core and serde-derive dependencies optional
✅ Added support for UnionArray (including FFI and IPC tests)
✅ Added support for FFI of Field

(full list below)

As usual, there is a small number of backward incompatible changes. The associated issues include the migration paths.

Finally, thank you to all contributors and reporters 🙇 In particular, thank you to polars and datafuse teams for the 🐛 reports. They help tremendously 💯

Full Changelog

Breaking changes:

Change dictionary iterator of values from Arrays of one element to Scalars #335
Align FFI API with arrow's C++ API #328
Make *_compare_scalar not return Result #316
Make io::print, get_value_display and get_display not return Result #286
Add MetadataVersion to IPC interfaces #282
Change DataType::Union to enable round trips in IPC #281
Removed clone requirement in StructArray -> RecordBatch #307 (jorgecarleitao)
Fixed error in reading a non-finished IPC stream. #302 (jorgecarleitao)
Generalized ZipIterator to accept a BitmapIter #296 (jorgecarleitao)

New features:

Added API to FFI Field #321 (jorgecarleitao)
Added compare_scalar #317 (jorgecarleitao)
Add UnionArray #283 (jorgecarleitao)

Fixed bugs:

SliceIterator of last bytes is not correct #292
Fixed error in displaying dictionaries with nulls in values #334 (jorgecarleitao)
Fixed error in dict equality #333 (jorgecarleitao)
Fixed small inconsistencies between compute::cast and compute::can_cast #295 (jorgecarleitao)
Removed order implementation for days_ms / Interval(DayTime) #285 (jorgecarleitao)

Enhancements:

Added support for remaining non-nested datatypes #336 (jorgecarleitao)
Made multiversion and lexical-core optional #324 (jorgecarleitao)
Improved performance of utf8 comparison (1.7x-4x) #322 (jorgecarleitao)
Improved performance of boolean comparison (5x-14x) #318 (jorgecarleitao)
Added trait TryPush #314 (jorgecarleitao)
Added cast date32 -> i64 and date64 -> i32 #308 (ritchie46)
Improved performance of comparison with SIMD feature flag (2x-3.5x) #305 (jorgecarleitao)
Added support to read json to BinaryArray #304 (jorgecarleitao)
Improved MutableFixedSizeBinaryArray #303 (jorgecarleitao)
Improved MutablePrimitiveArray and MutableUtf8Array #299 (jorgecarleitao)
Improved MutableBooleanArray #297 (jorgecarleitao)
Improved performance of concatenating non-aligned validities (15x) #291 (jorgecarleitao)
Added support for timestamps with tz and interval to io::print::write #287 (jorgecarleitao)
Improved debug repr of buffers and bitmaps. #284 (jorgecarleitao)
Cleaned up internals of json integration #280 (jorgecarleitao)
Removed serde_derive dependency #279 (jorgecarleitao)
Simplified IPC code. #277 (jorgecarleitao)
Reduced dependencies from confi-table and enabled wasm on io_print feature. #276 (jorgecarleitao)
Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #275 (ritchie46)

Documentation updates:

Cleaned examples and docs from old API. #330 (jorgecarleitao)
Improved documentation #306 (jorgecarleitao)

Testing updates:

Improved naming of testing workflows #315 (jorgecarleitao)
Added tests to scalar API #300 (jorgecarleitao)
Made CSV and JSON tests not use files. #290 (jorgecarleitao)
Moved tests to integration tests #289 (jorgecarleitao)

Closed issues:

Make parquet_read_record support async #331
Panic due to SIMD comparison #312
Bitmap::mutable line 155 may Panic/segfault #309
IPC's StreamReader may abort due to excessive memory by overflowing a usized variable #301
Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #259

Contributors

ritchie46 and sundy-li

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copy on Write

Support for ODBC

`async` support for writing to Arrow's IPC

Migrated `std::simd`

Support to Serde metadata

Support for Arrow C stream interface

Made crate `deny(missing_docs)`

Changelog

Contributors

Contributors

Contributors

Contributors

Contributors

Releases: jorgecarleitao/arrow2

v0.10.0

Copy on Write

Support for ODBC

async support for writing to Arrow's IPC

Migrated std::simd

Support to Serde metadata

Support for Arrow C stream interface

Made crate deny(missing_docs)

Changelog

Contributors

v0.9.0

Contributors

v0.8.0

v0.7.0

v0.6.2

v0.6.0

Contributors

v0.5.3

v0.5.2

v0.5.0

Contributors

v0.4.0

Contributors

`async` support for writing to Arrow's IPC

Migrated `std::simd`

Made crate `deny(missing_docs)`