Skip to content
This repository has been archived by the owner on Feb 18, 2024. It is now read-only.

Clarified differences with arrow crate #209

Merged
merged 2 commits into from
Jul 20, 2021

Conversation

alamb
Copy link
Collaborator

@alamb alamb commented Jul 20, 2021

I found some of the statements in this README misleading and I would like to propose some small corrections:

MIRI checks are running on the official crate (e.g): https://github.com/apache/arrow-rs/runs/3116901146

The official crate also has several feature flags, documented at https://github.com/apache/arrow-rs/tree/master/arrow#features , to control dependencies

Copy link
Collaborator Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao I also wonder if we should list arrow-flight support as something the official crate has that this one does not?

Perhaps also parquet_derive?

Copy link
Owner

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with features; I forgot to update here after the new releases. 👍

parquet-derive: good point.

Arrow-flight is also here (the migration was trivial, though).

@@ -51,13 +51,11 @@ venv/bin/python parquet_integration/write_parquet.py
* Uses Rust's compiler whenever possible to prove that memory reads are sound
* Reading parquet is 10-20x faster (single core) and deserialization is parallelizable
* Writing parquet is 3-10x faster (single core) and serialization is parallelizable
* MIRI checks on non-IO components (MIRI and file systems are a bit funny atm)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

@alamb alamb Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now enabled on master via apache/arrow-rs#421 thanks to the great work of @roee88 and @jhorstmann

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that if a crate has MIRI checks on the CI but ignores certain tests because they fail the check, then that does not really qualify as passing the check, right? It is a bit like commenting tests to make the CI green.

Maybe re-word to

all tests pass MIRI checks

instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only checks that are ignored are as follows. Most because they consume too much memory or take too long when they are run under MIRI. Full list below

Implying that the arrow-rs crate doesn't run MIRI seems inaccurate to me. Perhaps you can word this section of the readme a bit more positively and focus on on the complete MIRI coverage of all the tests so far or something?

/Users/alamb/Software/arrow-rs/arrow/src/array/raw_pointer.rs:60:    #[cfg_attr(miri, ignore)] // sometimes does not panic as expected
/Users/alamb/Software/arrow-rs/arrow/src/util/integration_util.rs:725:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/cast.rs:3525:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/cast_utils.rs:223:    #[cfg_attr(miri, ignore)] // unsupported operation: can't call foreign function: mktime
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:157:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:174:    #[cfg_attr(miri, ignore)] // running forever
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:284:    #[cfg_attr(miri, ignore)] // error: this test uses too much memory to run on CI
/Users/alamb/Software/arrow-rs/arrow/src/compute/kernels/length.rs:301:    #[cfg_attr(miri, ignore)] // error: this test uses too much memory to run on CI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I double checked it turns out that the results of MIRI were still being ignored 🤦 -- we'll fix that: apache/arrow-rs#578

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, those are not ignored because they take a long time to run; they take a long time to run because they constitute UB. The same tests pass on arrow2.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't wait to get all this arrow2 goodness into arrow-rs and share it with the world 👍

README.md Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 20, 2021

Codecov Report

Merging #209 (e7ab51f) into main (9004f13) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #209   +/-   ##
=======================================
  Coverage   76.81%   76.81%           
=======================================
  Files         226      226           
  Lines       19446    19446           
=======================================
  Hits        14938    14938           
  Misses       4508     4508           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9004f13...e7ab51f. Read the comment docs.

@alamb
Copy link
Collaborator Author

alamb commented Jul 20, 2021

re arrow-flight -- I was confused -- I didn't see it on crates.io, but I see it in the repo 👍 https://docs.rs/arrow2/0.1.0/arrow2/?search=flight and https://crates.io/search?q=arrow-flight

@jorgecarleitao jorgecarleitao merged commit 20072a5 into jorgecarleitao:main Jul 20, 2021
@alamb alamb deleted the patch-1 branch July 20, 2021 21:38
@jorgecarleitao jorgecarleitao changed the title Update README.md Clarified differences with arrow Jul 23, 2021
@jorgecarleitao jorgecarleitao changed the title Clarified differences with arrow Clarified differences with arrow crate Jul 23, 2021
@jorgecarleitao jorgecarleitao added the documentation Improvements or additions to documentation label Jul 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants