Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Clarified differences with arrow crate #209

Merged
merged 2 commits into from
Jul 20, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,13 +51,11 @@ venv/bin/python parquet_integration/write_parquet.py
* Uses Rust's compiler whenever possible to prove that memory reads are sound
* Reading parquet is 10-20x faster (single core) and deserialization is parallelizable
* Writing parquet is 3-10x faster (single core) and serialization is parallelizable
* MIRI checks on non-IO components (MIRI and file systems are a bit funny atm)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@alamb alamb Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now enabled on master via apache/arrow-rs#421 thanks to the great work of @roee88 and @jhorstmann

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that if a crate has MIRI checks on the CI but ignores certain tests because they fail the check, then that does not really qualify as passing the check, right? It is a bit like commenting tests to make the CI green.

Maybe re-word to

all tests pass MIRI checks

instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only checks that are ignored are as follows. Most because they consume too much memory or take too long when they are run under MIRI. Full list below

Implying that the arrow-rs crate doesn't run MIRI seems inaccurate to me. Perhaps you can word this section of the readme a bit more positively and focus on on the complete MIRI coverage of all the tests so far or something?

/Users/alamb/Software/arrow-rs/arrow/src/array/raw_pointer.rs:60:    #[cfg_attr(miri, ignore)] // sometimes does not panic as expected
/Users/alamb/Software/arrow-rs/arrow/src/util/integration_util.rs:725:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/cast.rs:3525:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/cast_utils.rs:223:    #[cfg_attr(miri, ignore)] // unsupported operation: can't call foreign function: mktime
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:157:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:174:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:284:    #[cfg_attr(miri, ignore)] // error: this test uses too much memory to run on CI
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:301:    #[cfg_attr(miri, ignore)] // error: this test uses too much memory to run on CI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I double checked it turns out that the results of MIRI were still being ignored 🤦 -- we'll fix that: apache/arrow-rs#578

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, those are not ignored because they take a long time to run; they take a long time to run because they constitute UB. The same tests pass on arrow2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't wait to get all this arrow2 goodness into arrow-rs and share it with the world 👍

* parquet IO has no `unsafe`
\* parquet IO has no `unsafe`
alamb marked this conversation as resolved.
Show resolved Hide resolved
* IPC supports big endian
* More predictable JSON reader
* `MutableArray` API to work with arrays in-place.
* Generalized parsing of CSV based on logical data types
* conditional compilation based on cargo `features` to reduce dependencies and size
* faster IPC reader (different design that avoids an extra copy of all data)
* IPC supports 2.0 (compression)

Expand Down