Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Releases: jorgecarleitao/arrow2

v0.10.0

12 Mar 21:02
Compare
Choose a tag to compare

Arrow2 0.10.0 is out! 🚀🚀🚀🚀🚀

Continuing breaking ground, this constitutes one of the most feature rich releases of this crate so far!

Thank you to everyone for the impressive work over the past 2.5 months that make arrow2 so feature rich, safe, fast, and easy to use! 🙇

Here are the main headlines:

Copy on Write

So far, whenever we applied a transformation to an array, we had to create a new array. When multiple operations were used (e.g. c1 x 2 + 1), it lead to the following compute pattern:

1. allocate new region
2. compute
3. allocate new region
4. compute

This was identified by @sundy-li on #741 and addressed by @ritchie46 on #794.

Users can now re-use Arced arrays, just like std::sync::Arc::get_mut. As expected, if the array is being used in multiple places, it will return a None and users do need to allocate a new region (exclusive mutability).

This is being used in Polars to further re-use allocated regions and therefore reduce both memory pressure and wasted compute cycles allocating new regions.

Support for ODBC

This release now supports reading from, and write to, any ODBC driver.

This builds on top of the superb odbc-api created by @pacman82, that allows this crate to use the columnar format provided by ODBC specification.

Given a performant ODBC driver, this is expected to be the fastest way to load data to the Arrow format, as many operations are simple memcopies.

Check out the example and guide for details on how to use it!

async support for writing to Arrow's IPC

Until now, we had limited support to writing to Arrow IPC asynchronously. @dexterduck closed this gap on #878, offering complete async support for both Arrow files and Arrow streams, including implementations of futures::Stream and futures::Sink for them!

Migrated std::simd

After some back and forth with the working group of the project portable simd, this release replaces packed_simd2 by std::simd. This resulted in no performance difference but allow us to leverage the great work that is happening on std::simd.

Support to Serde metadata

A common pain point in using arrow2's logical types is that they are quite rich, making them sometimes difficult
to visualize or represent in e.g. JSON. @houqp closed this with #858, that adds compatibility with Serde for
schema-related structs in this crate (PhysicalType DataType, Field, Schema).

Support for Arrow C stream interface

Arrow has an experimental specification for an FFI to iterators of arrow arrays. This release now fully supports this interface.

Made crate deny(missing_docs)

This makes us developers more conscious about documenting APIs, thereby allowing users more context about them. We have also start documenting IO-related APIs over whether they are CPU or IO-bounded, so that users know which ones block async contexts.

Changelog

Full Changelog

Breaking changes:

New features:

Fixed bugs:

Enhancements:

Read more

v0.9.0

14 Jan 22:22
Compare
Choose a tag to compare

A new release is here! 🎉🎉🎉🎉 This release has four major improvements:

  • It is now backed by std's Vec, thus making it
    • zero-copy with the rest of Rust's ecosystem
    • use less unsafe
    • more ergonomics
    • faster to compile
    • (no difference in performance)
  • It now supports reading from, and writing to, Apache Avro, both sync and async
  • flatbuffers dependency was replaced by planus, a re-implementation of the flatbuffers specification in Rust (you should check out that project, awesome work by @kristoff3r and @TethysSvensson)
    • lower risks of unsound
    • easier-to-maintain code base
  • Improved security and general maintenance:
    • Made most of the crate #[forbid(unsafe)]
    • significantly reduced the use of unsafe via bytemuck's dependency
    • made most of parsing of Arrow IPC panic-free, to reduce risks of DOS from untrusted data

A big thanks to all contributors (listed below) and our users for all the dedication, hard work, and patience. 🙇

Breaking changes:

New features:

Fixed bugs:

Enhancements:

Read more

v0.8.0

27 Nov 06:20
Compare
Choose a tag to compare

A new release is here 🚀🚀🚀

This release has so many important new features and bug fixes that will be summarized as: thank you everyone for all the issues and PRs that resulted in this release (in order of appearance) 🙇🙇🙇🙇:

Full Changelog

Breaking changes:

New features:

Fixed bugs:

Enhancements:

Documentation updates:

Testing updates:

Read more

v0.7.0

29 Oct 19:24
Compare
Choose a tag to compare

Another release is here 🚀🚀🚀

As usual, a bunch of optimizations as well as some work in two main fronts:

  • make the crate smaller and easier to compile
  • support for nested parquet reads

Thank you to all contributors (names below) for the amazing contributions!

Breaking changes:

New features:

Fixed bugs:

Enhancements:

Documentation updates:

Testing updates:

v0.6.2

09 Oct 03:51
Compare
Choose a tag to compare

Small release with two minor but relevant bug fixes and a new feature.

Full Changelog

New features:

  • Added wrapping version arithmetics for PrimitiveArray #496 (yjhmelody)

Fixed bugs:

Enhancements:

v0.6.0

07 Oct 23:11
Compare
Choose a tag to compare

(in crates as 0.6.1: I made a mistake in publishing). Anyways, another big release is here!

There are just too many improvements for a 22 days release - let's try to capture important mentions:

  • Buffer and MutableBuffer are now compatible with Rust's std::Vec with no strings attached: everything continues to work, including FFI with the rest of the ecosystem! You can recover the previous behavior (of using cached-aligned allocations), via feature cache_aligned
  • Added broad support to timestamp with timezones. Kudos to @VasanthakumarV for all the help.
  • Added read Decimal from parquet. Kudos to @potter420 for the contribution.
  • More improvements to performance. Kudos to @Dandandan and @ritchie46.
  • Support to read from the Avro via feature io_avro

Full Changelog

Breaking changes:

  • Bring MutableFixedSizeListArray to the spec used by the rest of the Mutable API #475
  • Removed ALIGNMENT invariant from [Mutable]Buffer #449
  • Un-nested compute::arithemtics::basic #461 (jorgecarleitao)
  • Added more serialization options for csv writer. #453 (ritchie46)
  • Changed validity from &Option<Bitmap> to Option<&Bitmap>. #431 (jorgecarleitao)
  • Bumped parquet2 #422 (jorgecarleitao)
  • Changed IPC FileWriter to own the writer. #420 (yjshen)
  • Made DynComparator Send+Sync #414 (yjshen)

New features:

Fixed bugs:

Enhancements:

Documentation updates:

Testing updates:

Read more

v0.5.3

14 Sep 05:51
Compare
Choose a tag to compare

A new release is here, containing bug fixes and backward-compatible enhancements.

Thank you to all involved in the testing and development that resulted in this version!

Full Changelog

New features:

  • Added support to read and write extension types to and from parquet #396 (jorgecarleitao)

Fixed bugs:

Enhancements:

  • Added support to read dict-encoded required primitive types from parquet #402 (Dandandan)
  • Added Array::with_validity #399 (ritchie46)

Testing updates:

v0.5.2

09 Sep 21:00
Compare
Choose a tag to compare

Hot fix release to make the API docs contain all optional features.

Full Changelog

Documentation updates:

  • [0.5] The docs io module has no submodules #390
  • Made docs be compiled with feature full #391 (jorgecarleitao)

v0.5.0

08 Sep 17:31
Compare
Choose a tag to compare

A new release is here! 🎉🎉🎉

This one marked by further alignment with the arrow specification. Of special mention:

  • ✅ Added full support for async parquet write (by @GrandChaman)
  • ✅ Added fast extend_*values to MutablePrimitiveArray (by @ritchie46)
  • ✅ Added support for compute to BinaryArray(by @zhyass)
  • ✅ Added support to extension types (IPC, FFI, etc.) (by @jorgecarleitao)
  • ✅ Added support for the brand new MONTH_DAY_NANO interval type (by @jorgecarleitao)
  • 🚀 Improved performance of the calculation of null counts by 5x (by @jorgecarleitao)
  • 🔧 Made cargo features not default (by @jorgecarleitao)

As usual, there is a small number of backward incompatible changes. See associated issues below, which include the migration paths to each of them.

Full Changelog

Breaking changes:

  • Added Extension to DataType #361
  • MonthDayNano added to enum IntervalUnit #360
  • Make io::parquet::write::write_* return size of file in bytes #354
  • Renamed bitmap::utils::null_count to bitmap::utils::count_zeros #342
  • Made GroupFilter optional in parquet'sRecordReader and added method to set it. #386 (jorgecarleitao)
  • Removed PartialOrd and Ord of all enums in datatypes #379 (jorgecarleitao)
  • Made cargo features not default #369 (jorgecarleitao)
  • Prepare APIs for extension types #357 (jorgecarleitao)

New features:

Fixed bugs:

  • Parquet read skips a few rows at the end of the page #373
  • parquet_read fails when a column has too many rows with string values #366
  • parquet_read panics with index_out_of_bounds #351
  • Fixed error in MutableBitmap::push_unchecked #384 (jorgecarleitao)
  • Fixed display of timestamp with tz. #375 (jorgecarleitao)

Enhancements:

Documentation updates:

Testing updates:

v0.4.0

24 Aug 21:47
Compare
Choose a tag to compare

A new release is here! 🎉🎉🎉

This one marked by a lot of enhancements to existing functionality. Of special mention:

  • 🚀 improved performance of integer division by 4x-10x via strength division (@sundy-li and @ritchie46)
  • 🚀 improved performance of concatenating nullable arrays by 4x
  • 🚀 improved performance of comparisons by 2x-14x
  • 🔧 moved most tests to a separate directory
  • 🔧 Increased test coverage to over 80%
  • 🔧 Made multiversion, lexical-core and serde-derive dependencies optional
  • ✅ Added support for UnionArray (including FFI and IPC tests)
  • ✅ Added support for FFI of Field

(full list below)

As usual, there is a small number of backward incompatible changes. The associated issues include the migration paths.

Finally, thank you to all contributors and reporters 🙇 In particular, thank you to polars and datafuse teams for the 🐛 reports. They help tremendously 💯

Full Changelog

Breaking changes:

  • Change dictionary iterator of values from Arrays of one element to Scalars #335
  • Align FFI API with arrow's C++ API #328
  • Make *_compare_scalar not return Result #316
  • Make io::print, get_value_display and get_display not return Result #286
  • Add MetadataVersion to IPC interfaces #282
  • Change DataType::Union to enable round trips in IPC #281
  • Removed clone requirement in StructArray -> RecordBatch #307 (jorgecarleitao)
  • Fixed error in reading a non-finished IPC stream. #302 (jorgecarleitao)
  • Generalized ZipIterator to accept a BitmapIter #296 (jorgecarleitao)

New features:

Fixed bugs:

Enhancements:

Documentation updates:

Testing updates:

Closed issues:

  • Make parquet_read_record support async #331
  • Panic due to SIMD comparison #312
  • Bitmap::mutable line 155 may Panic/segfault #309
  • IPC's StreamReader may abort due to excessive memory by overflowing a usized variable #301
  • Improve performance of rem_scalar/div_scalar for integer types (4x-10x) #259