diff --git a/.github_changelog_generator b/.github_changelog_generator index fb8b38cff57..8aeba8fbb1c 100644 --- a/.github_changelog_generator +++ b/.github_changelog_generator @@ -1,5 +1,7 @@ -since-tag=v0.1.0 +since-tag=v0.2.0 +future-release=v0.3.0 pr-wo-labels=false add-sections={"features":{"prefix":"**Enhancements:**","labels":["enhancement"]}, "documentation":{"prefix":"**Documentation updates:**","labels":["documentation"]}} enhancement-label=**New features:** enhancement-labels=feature +base=CHANGELOG.md diff --git a/CHANGELOG.md b/CHANGELOG.md index bf8d3c2207b..0f3656d8f63 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,54 @@ # Changelog +## [v0.3.0](https://github.com/jorgecarleitao/arrow2/tree/v0.3.0) (2021-08-11) + +[Full Changelog](https://github.com/jorgecarleitao/arrow2/compare/v0.2.0...v0.3.0) + +**Breaking changes:** + +- Renamed `sum` to `sum_primitive` [\#273](https://github.com/jorgecarleitao/arrow2/issues/273) +- Moved trait `Index` from `array::Index` to `types::Index` [\#272](https://github.com/jorgecarleitao/arrow2/issues/272) +- Added optional `projection` to IPC FileReader [\#271](https://github.com/jorgecarleitao/arrow2/issues/271) +- Added optional `page_filter` to parquet's `RecordReader` and `get_page_iterator` [\#270](https://github.com/jorgecarleitao/arrow2/issues/270) +- Renamed parquets' `CompressionCodec` to `Compression` [\#269](https://github.com/jorgecarleitao/arrow2/issues/269) + +**New features:** + +- Added support for FFI of dictionary-encoded arrays [\#267](https://github.com/jorgecarleitao/arrow2/pull/267) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Added support for projection pushdown on IPC files [\#264](https://github.com/jorgecarleitao/arrow2/pull/264) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Added support to read parquet asynchronously [\#260](https://github.com/jorgecarleitao/arrow2/pull/260) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Added support to filter parquet pages. [\#256](https://github.com/jorgecarleitao/arrow2/pull/256) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Added wrapping\_cast to cast kernels [\#254](https://github.com/jorgecarleitao/arrow2/pull/254) ([sundy-li](https://github.com/sundy-li)) +- Added support to parquet IO on wasm32 [\#239](https://github.com/jorgecarleitao/arrow2/pull/239) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Added support to round-trip dictionary arrays on parquet [\#232](https://github.com/jorgecarleitao/arrow2/pull/232) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Added Scalar API [\#56](https://github.com/jorgecarleitao/arrow2/pull/56) ([jorgecarleitao](https://github.com/jorgecarleitao)) + +**Fixed bugs:** + +- Fixed error in computing remainder of chunk iterator [\#262](https://github.com/jorgecarleitao/arrow2/pull/262) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Fixed error in slicing bitmap. [\#250](https://github.com/jorgecarleitao/arrow2/pull/250) ([jorgecarleitao](https://github.com/jorgecarleitao)) + +**Enhancements:** + +- Improve the performance in cast kernel using AsPrimitive trait in generic dispatch [\#252](https://github.com/jorgecarleitao/arrow2/issues/252) +- Poor performance in `sort::sort_to_indices` with limit option in arrow2 [\#245](https://github.com/jorgecarleitao/arrow2/issues/245) +- Support loading Feather v2 \(IPC\) files with more than 1 million tables [\#231](https://github.com/jorgecarleitao/arrow2/issues/231) +- Migrated to parquet2 v0.3 [\#265](https://github.com/jorgecarleitao/arrow2/pull/265) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Added more tests to cast and min/max [\#253](https://github.com/jorgecarleitao/arrow2/pull/253) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Prettytable is unmaintained. Change to comfy-table [\#251](https://github.com/jorgecarleitao/arrow2/pull/251) ([PsiACE](https://github.com/PsiACE)) +- Added IndexRange to remove checks in hot loops [\#247](https://github.com/jorgecarleitao/arrow2/pull/247) ([jorgecarleitao](https://github.com/jorgecarleitao)) +- Make merge\_sort\_slices MergeSortSlices public [\#243](https://github.com/jorgecarleitao/arrow2/pull/243) ([sundy-li](https://github.com/sundy-li)) + +**Documentation updates:** + +- Added example and guide section on compute [\#242](https://github.com/jorgecarleitao/arrow2/pull/242) ([jorgecarleitao](https://github.com/jorgecarleitao)) + +**Closed issues:** + +- Allow projection pushdown to IPC files [\#261](https://github.com/jorgecarleitao/arrow2/issues/261) +- Add support to write dictionary-encoded pages [\#211](https://github.com/jorgecarleitao/arrow2/issues/211) +- Make IpcWriteOptions easier to find. [\#120](https://github.com/jorgecarleitao/arrow2/issues/120) + ## [v0.2.0](https://github.com/jorgecarleitao/arrow2/tree/v0.2.0) (2021-07-30) [Full Changelog](https://github.com/jorgecarleitao/arrow2/compare/v0.1.0...v0.2.0) diff --git a/Cargo.toml b/Cargo.toml index 5d8d8bc61f6..0775ae6a8d0 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "arrow2" -version = "0.2.0" +version = "0.3.0" license = "Apache-2.0" description = "Unofficial implementation of Apache Arrow spec in safe Rust" homepage = "https://github.com/jorgecarleitao/arrow2" diff --git a/README.md b/README.md index 4a04868fa41..b6192ae5ec0 100644 --- a/README.md +++ b/README.md @@ -52,17 +52,31 @@ venv/bin/python parquet_integration/write_parquet.py ## Features in this crate and not in the official +### Safety and Security + +* safe by design (i.e. no transmutes, runtime type checking nor pointer casts) * Uses Rust's compiler whenever possible to prove that memory reads are sound +* All non-IO components pass MIRI checks (MIRI and file systems are a bit funny atm) + +### Arrow Format + +* IPC supports big endian +* `MutableArray` API to work in-memory in-place. +* faster IPC reader (different design that avoids an extra copy of all data) +* IPC supports 2.0 (compression) +* FFI support for dictionary-encoded arrays + +### Parquet + * Reading parquet is 10-20x faster (single core) and deserialization is parallelizable * Writing parquet is 3-10x faster (single core) and serialization is parallelizable -* MIRI checks on non-IO components (MIRI and file systems are a bit funny atm) * parquet IO has no `unsafe` -* IPC supports big endian +* parquet IO supports `async` read + +### Others + * More predictable JSON reader -* `MutableArray` API to work with arrays in-place. * Generalized parsing of CSV based on logical data types -* faster IPC reader (different design that avoids an extra copy of all data) -* IPC supports 2.0 (compression) ## Features in the original not available in this crate @@ -72,12 +86,11 @@ venv/bin/python parquet_integration/write_parquet.py ## Features in this crate not in pyarrow * Read and write of delta-encoded utf8 to and from parquet -* parquet roundtrip of all arrow types. +* parquet roundtrip of all supported arrow types. -## Roadmap +## Features in pyarrow not in this crate -1. parquet read of nested types. -2. bring documentation up to speed +Too many to enumerate; e.g. nested dictionary arrays, union, map, nested parquet. ## How to develop