Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a note about arrow crate security / safety #628

Merged
merged 3 commits into from
Aug 7, 2021

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 28, 2021

Which issue does this PR close?

Closes #627

Rationale for this change

If not used carefully, unsafe code can be written using arrow-rs, so we should tell users about this so they can be well informed
See previous discussion on the mailing list:

What changes are included in this PR?

Update README in arrow crate explaining what is going on

@alamb alamb added documentation Improvements or additions to documentation arrow Changes to the arrow crate labels Jul 28, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jul 28, 2021

Codecov Report

Merging #628 (c69fda5) into master (be98b17) will decrease coverage by 0.00%.
The diff coverage is 81.81%.

❗ Current head c69fda5 differs from pull request most recent head 7c4298f. Consider uploading reports for the commit 7c4298f to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #628      +/-   ##
==========================================
- Coverage   82.48%   82.47%   -0.01%     
==========================================
  Files         167      167              
  Lines       46452    46454       +2     
==========================================
- Hits        38315    38314       -1     
- Misses       8137     8140       +3     
Impacted Files Coverage Δ
arrow/src/array/array_binary.rs 92.23% <ø> (ø)
arrow/src/array/array_list.rs 95.51% <ø> (ø)
arrow/src/array/array_string.rs 97.85% <ø> (ø)
arrow/src/compute/kernels/filter.rs 90.82% <0.00%> (-1.17%) ⬇️
...ng/src/flight_client_scenarios/integration_test.rs 0.00% <0.00%> (ø)
...ng/src/flight_server_scenarios/integration_test.rs 0.00% <ø> (ø)
integration-testing/src/lib.rs 0.00% <0.00%> (ø)
parquet/benches/arrow_writer.rs 0.00% <0.00%> (ø)
parquet/src/record/triplet.rs 92.37% <ø> (ø)
arrow/src/array/equal/mod.rs 93.93% <100.00%> (ø)
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update be98b17...7c4298f. Read the comment docs.

@@ -35,6 +39,26 @@ The arrow crate provides the following optional features:
implementations of some [compute](https://github.com/apache/arrow/tree/master/rust/arrow/src/compute)
kernels using explicit SIMD processor intrinsics.

## Safety

TLDR: You should avoid using the `alloc` and `buffer` and `bitmap` modules if at all possible. These modules contain `unsafe` code and are easy to misuse.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgecarleitao @jhorstmann @nevi-me @houqp @ritchie46 @andygrove

Is this a fair assessment, in your opinion, about the risk of using the arrow crate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it is, and I like your suggestion of putting them behind a feature flag

arrow/README.md Outdated
println!("{:?}", array.value(1));
```

NOTE: We plan to deprecate and make these modules private as part of a follow on release, as part of our journey of redesigning this crate.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #629 as a proposal to mark these modules private. Feedback more than welcome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the change could be landed in a short period, I would omit the note, as it would be temporary.

Copy link
Contributor

@nevi-me nevi-me left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment about possibly removing a note

@alamb alamb force-pushed the alamb/propose_safety_notice branch from 7c4298f to 5b6a7b6 Compare July 30, 2021 20:23

_Background_: There are various parts of the `arrow` crate which use `unsafe` and `transmute` code internally. We are actively working as a community to minimize undefined behavior and remove `unsafe` usage to align more with Rust's core principles of safety (e.g. the arrow2 project).

As `arrow` exists today, it is fairly easy to misuse the APIs, leading to undefined behavior, and it is especially easy to misuse code in modules named above. For an example, as described in [the arrow2 crate](https://github.com/jorgecarleitao/arrow2#why), the following code compiles, does not panic, but results in undefined behavior:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it is fairly easy to misuse the APIs" - isn't this mostly a statement about ArrayData? Do many people encounter UB using the other API's in Arrow?

What do you think about putting ArrayData behind the unsafe flag also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that if code is using ArrayData directly it is likely to be of the "easy to misuse category". What do others think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paddyhoran I am going to merge this PR and perhaps we can clarify / improve the wording about ArrayData in a follow on PR?

@alamb alamb merged commit b682ef5 into apache:master Aug 7, 2021
@alamb alamb deleted the alamb/propose_safety_notice branch October 26, 2021 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a note about arrow crate security / safety
5 participants