-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a note about arrow crate security / safety #628
Conversation
Codecov Report
@@ Coverage Diff @@
## master #628 +/- ##
==========================================
- Coverage 82.48% 82.47% -0.01%
==========================================
Files 167 167
Lines 46452 46454 +2
==========================================
- Hits 38315 38314 -1
- Misses 8137 8140 +3
Continue to review full report at Codecov.
|
@@ -35,6 +39,26 @@ The arrow crate provides the following optional features: | |||
implementations of some [compute](https://github.com/apache/arrow/tree/master/rust/arrow/src/compute) | |||
kernels using explicit SIMD processor intrinsics. | |||
|
|||
## Safety | |||
|
|||
TLDR: You should avoid using the `alloc` and `buffer` and `bitmap` modules if at all possible. These modules contain `unsafe` code and are easy to misuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgecarleitao @jhorstmann @nevi-me @houqp @ritchie46 @andygrove
Is this a fair assessment, in your opinion, about the risk of using the arrow
crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it is, and I like your suggestion of putting them behind a feature flag
arrow/README.md
Outdated
println!("{:?}", array.value(1)); | ||
``` | ||
|
||
NOTE: We plan to deprecate and make these modules private as part of a follow on release, as part of our journey of redesigning this crate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #629 as a proposal to mark these modules private. Feedback more than welcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the change could be landed in a short period, I would omit the note, as it would be temporary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment about possibly removing a note
7c4298f
to
5b6a7b6
Compare
|
||
_Background_: There are various parts of the `arrow` crate which use `unsafe` and `transmute` code internally. We are actively working as a community to minimize undefined behavior and remove `unsafe` usage to align more with Rust's core principles of safety (e.g. the arrow2 project). | ||
|
||
As `arrow` exists today, it is fairly easy to misuse the APIs, leading to undefined behavior, and it is especially easy to misuse code in modules named above. For an example, as described in [the arrow2 crate](https://github.com/jorgecarleitao/arrow2#why), the following code compiles, does not panic, but results in undefined behavior: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"it is fairly easy to misuse the APIs" - isn't this mostly a statement about ArrayData
? Do many people encounter UB using the other API's in Arrow?
What do you think about putting ArrayData
behind the unsafe
flag also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that if code is using ArrayData
directly it is likely to be of the "easy to misuse category". What do others think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paddyhoran I am going to merge this PR and perhaps we can clarify / improve the wording about ArrayData
in a follow on PR?
Which issue does this PR close?
Closes #627
Rationale for this change
If not used carefully, unsafe code can be written using arrow-rs, so we should tell users about this so they can be well informed
See previous discussion on the mailing list:
What changes are included in this PR?
Update README in arrow crate explaining what is going on