-
Notifications
You must be signed in to change notification settings - Fork 224
Clarified differences with arrow crate #209
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao I also wonder if we should list arrow-flight
support as something the official crate has that this one does not?
Perhaps also parquet_derive
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with features; I forgot to update here after the new releases. 👍
parquet-derive: good point.
Arrow-flight is also here (the migration was trivial, though).
@@ -51,13 +51,11 @@ venv/bin/python parquet_integration/write_parquet.py | |||
* Uses Rust's compiler whenever possible to prove that memory reads are sound | |||
* Reading parquet is 10-20x faster (single core) and deserialization is parallelizable | |||
* Writing parquet is 3-10x faster (single core) and serialization is parallelizable | |||
* MIRI checks on non-IO components (MIRI and file systems are a bit funny atm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we ignoring miri checks in the official crate over some of the tests because they are not passing? E.g. https://github.com/apache/arrow-rs/blob/master/arrow/src/compute/kernels/cast.rs#L3525 and https://github.com/apache/arrow-rs/blob/master/arrow/src/array/raw_pointer.rs#L60 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is now enabled on master via apache/arrow-rs#421 thanks to the great work of @roee88 and @jhorstmann
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that if a crate has MIRI checks on the CI but ignores certain tests because they fail the check, then that does not really qualify as passing the check, right? It is a bit like commenting tests to make the CI green.
Maybe re-word to
all tests pass MIRI checks
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only checks that are ignored are as follows. Most because they consume too much memory or take too long when they are run under MIRI. Full list below
Implying that the arrow-rs crate doesn't run MIRI seems inaccurate to me. Perhaps you can word this section of the readme a bit more positively and focus on on the complete MIRI coverage of all the tests so far or something?
/Users/alamb/Software/arrow-rs/arrow/src/array/raw_pointer.rs:60: #[cfg_attr(miri, ignore)] // sometimes does not panic as expected
/Users/alamb/Software/arrow-rs/arrow/src/util/integration_util.rs:725: #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/cast.rs:3525: #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/cast_utils.rs:223: #[cfg_attr(miri, ignore)] // unsupported operation: can't call foreign function: mktime
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:157: #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:174: #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:284: #[cfg_attr(miri, ignore)] // error: this test uses too much memory to run on CI
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:301: #[cfg_attr(miri, ignore)] // error: this test uses too much memory to run on CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that I double checked it turns out that the results of MIRI were still being ignored 🤦 -- we'll fix that: apache/arrow-rs#578
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, those are not ignored because they take a long time to run; they take a long time to run because they constitute UB. The same tests pass on arrow2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't wait to get all this arrow2 goodness into arrow-rs and share it with the world 👍
Codecov Report
@@ Coverage Diff @@
## main #209 +/- ##
=======================================
Coverage 76.81% 76.81%
=======================================
Files 226 226
Lines 19446 19446
=======================================
Hits 14938 14938
Misses 4508 4508 Continue to review full report at Codecov.
|
re arrow-flight -- I was confused -- I didn't see it on crates.io, but I see it in the repo 👍 https://docs.rs/arrow2/0.1.0/arrow2/?search=flight and https://crates.io/search?q=arrow-flight |
I found some of the statements in this README misleading and I would like to propose some small corrections:
MIRI checks are running on the official crate (e.g): https://github.com/apache/arrow-rs/runs/3116901146
The official crate also has several feature flags, documented at https://github.com/apache/arrow-rs/tree/master/arrow#features , to control dependencies