-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #296] Rust: Use a tighter inner type for Error #356
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is feedback mostly for @nbdd0121 and @ojeda - because @foxhlchen is doing what #296 is asking for.
What are the benefits of using a NonZero
inner type here? Sure, it cannot be 0
- but that doesn't provide us with any benefit, since we already strictly enforce the [-MAX_ERRNO..-1]
invariant.
I see plenty of downsides:
- more complex code for little or no gain
- if we use
NonZeroI16::new()
(as happens here) there's no need for unsafe code, but we need anunwrap()
which could potentiallypanic!
if things go wrong. this is rather the opposite of the "fail gracefully" that we are working towards - if we use
unsafe NonZeroI16::new_unchecked()
(as also happens here) then we are using unsafe code for no good reason, plus a good chance of introducing UB when0
is passed by mistake. Again, this is the opposite of the "fail gracefully" that we are working towards
Suggestion: let's simply use an i16
for the inner type. The invariant is still enforced. And it allows us to get rid of awkward casts. This is actually a very elegant and simple change.
It has performance implication. |
rust/kernel/error.rs
Outdated
/// # Safety | ||
/// | ||
/// The parameter must be a valid kernel error number. | ||
macro_rules! kernel_const_to_error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be made into a const fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on converting this to a const function. I think it makes sense for it to be a constructor of Error, something like:
const fn from_const(errno: u32) -> Self {
let value = -(errno as i32);
if value < -(bindings::MAX_ERRNO as i32) || value >= 0 {
panic!("Error number out of range");
}
// SAFETY: We have checked above that `v` is within the right range.
unsafe { Self::from_kernel_errno_unchecked(value) }
}
We should only use macros when no reasonable alternative exists within the core language.
rust/kernel/error.rs
Outdated
@@ -76,7 +87,8 @@ impl Error { | |||
|
|||
// INVARIANT: the check above ensures the type invariant | |||
// will hold. | |||
Error(errno) | |||
let nzi16_errno = NonZeroI16::new(errno as i16).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use new_unchecked
since errno is checked to be non-zero.
That's fascinating to see - genuinely, no snark! Yet, do we have any measurements or benchmarks to show that this really makes a difference in the real world? In the absence of the latter, should we choose a solution that favours simplicity, and least chance of inadvertent There's nothing preventing us from reviewing a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not on board with a NonZero
inner type, but will review just in case :)
@@ -102,11 +117,14 @@ impl fmt::Debug for Error { | |||
fn rust_helper_errname(err: c_types::c_int) -> *const c_types::c_char; | |||
} | |||
// SAFETY: FFI call. | |||
let name = unsafe { rust_helper_errname(-self.0) }; | |||
let name = unsafe { rust_helper_errname(-self.to_kernel_errno()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the to_kernel_errno()
call required here? Why not use -self.0.get()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the
to_kernel_errno()
call required here?
no
Why not use
-self.0.get()
?
To me, using to_kernel_errno()
is clearer, and if we change the inner representation of Error again in the future, we don't need to touch this code. But that is just my personal taste. :)
rust/kernel/error.rs
Outdated
// | ||
// Safety: `errno` must not be zero, which is guaranteed by the contract | ||
// of this function. | ||
Error(unsafe { NonZeroI16::new_unchecked(errno as i16) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a // CAST
annotation that explains why casting a c_int
to an i16
won't lose any bits and result in UB?
Hmm... you guys make it tough for me to choose. :-) On one hand, I agree with @TheSven73 that it might be slightly "too much" before we hit mainline. People will be likely to look at this file in particular, since some may have heard about "Rust nice error handling", " On the other hand, perhaps |
Consider very carefully the viewpoint and incentives of the people which are most important for you to convince. |
Allow me to play Devil's advocate - I love doing that :) |
A benchmark! :) In general, we should strive for simplicity unless there is a reason not to (a reason being the benchmark showing it makes a difference), specially before mainline. |
🥇 I'm assuming we only have a single non-trivial, performance-sensitive, real-world Rust-for-Linux application right now? That would be binder? Something tells me that @wedsonaf might have benchmarks somewhere. But that discussion would belong in a separate PR IMHO. |
Yeah, I would say so. (Wedson is on holidays, I think he is back next week). |
Actually we maybe can make the inner type a |
Then it's no better than using i16? both will occupy 4byte |
You don't need to have sign extensions when converting a |
Ok, But this is also true for NonZeroI16, right? We can save a lot of space but may incur performance penalties. |
For So performance-wise we have |
Got it, thanks for the explanation. I now think NonZerol32 is the best choice. :) Let me change it to use NonZeroI32. |
70a5b2d
to
f98e62a
Compare
This comment has been minimized.
This comment has been minimized.
f98e62a
to
4dd79c0
Compare
This comment has been minimized.
This comment has been minimized.
4dd79c0
to
dafb4b0
Compare
This comment has been minimized.
This comment has been minimized.
rust/kernel/error.rs
Outdated
// | ||
// Safety: `errno` must be non zero, which is guaranteed by the check | ||
// above. | ||
let nz_errno = unsafe { NonZeroI32::new_unchecked(errno) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to call from_kernel_errno_unchecked()
here to prevent having to write
Error(unsafe { NonZeroI32::new_unchecked(errno) })
twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean
unsafe{ Error::from_kernel_errno_unchecked() }
?
it's not terser
@@ -21,44 +21,49 @@ use core::str::{self, Utf8Error}; | |||
/// | |||
/// The value is a valid `errno` (i.e. `>= -MAX_ERRNO && < 0`). | |||
#[derive(Clone, Copy, PartialEq, Eq)] | |||
pub struct Error(c_types::c_int); | |||
pub struct Error(NonZeroI32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you guys believe that NonZeroI32
is optimal, perhaps put a comment that explains why?
Otherwise we might get PRs from newcomers who think that NonZeroI32
occupies too much space, so they'll replace with NonZeroI16
...
rust/kernel/error.rs
Outdated
|
||
impl Error { | ||
/// Invalid argument. | ||
pub const EINVAL: Self = Error(-(bindings::EINVAL as i32)); | ||
pub const EINVAL: Self = kernel_const_to_error(bindings::EINVAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unfamiliar with Rust macros, but I wonder if it would be possible to write a macro that takes the Error
name as the input parameter, and expands to the whole pub const
definition? Maybe even a static_assert!
too?
Example:
declare_const_error!(EINVAL);
// would expand to
static_assert!(bindings::EINVAL < 0 && bindings::EINVAL >= -MAX_ERRNO);
pub const EINVAL: Self = Error(unsafe { NonZeroI32::new_unchecked(-(bindings::EINVAL as i32)) });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
But I don't know why even static_assert!(bindings::ENOMEM < bindings::MAX_ERRNO)
doesn't work.
I got the following error:
error: `const` items in this context need a name
--> rust/kernel/static_assert.rs:36:15
|
30 | / macro_rules! static_assert {
31 | | ($condition:expr) => {
32 | | // Based on the latest one in `rustc`'s one before it was [removed].
33 | | //
... |
36 | | const _: () = [()][!($condition) as usize];
| | ^ `_` is not a valid name for this `const` item
37 | | };
38 | | }
| |_- in this expansion of `static_assert!`
|
::: rust/kernel/error.rs:42:5
|
42 | static_assert!(bindings::ENOMEM < bindings::MAX_ERRNO);
| ------------------------------------------------------- in this macro invocation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you write like what @TheSven73 suggests, const _
in static_assert!
will become an associated constant rather than just a normal item, so it must have a name. It'll need to expand into something like
pub const EINVAL: Self = {
static_assert!(bindings::EINVAL < 0 && bindings::EINVAL >= -MAX_ERRNO);
Error(unsafe { NonZeroI32::new_unchecked(-(bindings::EINVAL as i32)) })
};
But I think it'll be better to put the check in a const fn + const panic instead of using static_assert
in this case:
const fn i_am_bad_at_naming_things(errno: i32) -> Error {
assert!(errno < -(bindings::MAX_ERRNO as i32) || errno >= 0);
Error(unsafe { NonZeroI32::new_unchecked(errno) })
}
pub const EINVAL: Self = i_am_bad_at_naming_things(-(bindings::EINVAL as i32));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Thank you, Gary!!
Can I combine this? i.e. declare a i_am_bad_at_naming_things()
function and write a macro to wrap
pub const EINVAL: Self = i_am_bad_at_naming_things(-(bindings::EINVAL as i32));
into declare_const_error!(EINVAL);
Do you think it's a good idea??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, I opted for the first approach.
macro_rules! declare_const_error {
($($tt:tt)*) => {
pub const $($tt)*: Self = {
static_assert!(bindings::$($tt)* <= bindings::MAX_ERRNO && bindings::$($tt)* > 0);
Error(unsafe { NonZeroI32::new_unchecked(-(bindings::$($tt)* as i32)) })
};
};
}
Thank you again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, putting doc inside the macro invocation is bizarre to me.
I don't think it will be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can have everything in a single macro:
macro_rules! declare_const_error {
($($(#[$attr:meta])* $name:ident)*) => { $(
$(#[$attr])*
pub const $name: Self = {
static_assert!(bindings::$name <= bindings::MAX_ERRNO && bindings::$name > 0);
Error(unsafe { NonZeroI32::new_unchecked(-(bindings::$name as i32)) })
};
)* };
}
declare_const_error! {
/// Invalid argument.
EINVAL
/// Out of memory.
ENOMEM
// and more
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you can have everything in a single macro:
macro_rules! declare_const_error { ($($(#[$attr:meta])* $name:ident)*) => { $( $(#[$attr])* pub const $name: Self = { static_assert!(bindings::$name <= bindings::MAX_ERRNO && bindings::$name > 0); Error(unsafe { NonZeroI32::new_unchecked(-(bindings::$name as i32)) }) }; )* }; } declare_const_error! { /// Invalid argument. EINVAL /// Out of memory. ENOMEM // and more }
This is cool. However, I strongly disagree we use something like it.
After messing around, I realize the initial way is clearest.
For
impl Error {
pub const EINVAL: Self = kernel_const_to_error(bindings::EINVAL);
}
A glimpse is enough to let you understand that we have an EINVAL
const in the Error
scope whose value is somehow related to the kernel's EINVAL (and you can easily infer the value of EINVAL equals to negative bindings::EINVAL).
But with
impl Error{
declare_const_error! {
/// Invalid argument.
EINVAL
/// Out of memory.
ENOMEM
// and more
}
}
You need to spend time figuring out the magic here especially when you are not familiar with rust macro.
It adds more mental burden to developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
Ok.
But his conclusion is correct. |
Indeed |
Thanks for the explanation and the pointer! Is scalar pair Rust-specific? Is there a document describing the Rust or LLVM behaviour about this? I couldn't seem to find much material on the Internet and there're no mentions of scalar pairs in SysV ABI. |
AFAIK it is indeed rust specific. The Rust abi is unstable and apart from the source code I linked I believe there is no documentation. I had to look at it in the past to use the same abi in rustc_codegen_cranelift as in the llvm backend. |
In Rust, |
Just making sure that I read this correctly: that's 1.96 per-thousand (per mille), not 1.96 percent, correct? |
Can you do a microbenchmark just for the Result?? |
Yes, I misread the number of zeros originally 🤦 and write percent. I was a bit surprised by such a large change and double-checked the result. I think binder might not be the reliable benchmark for other code (we are more likely just testing how often the slow path of mutex_lock is being hit, which is affected by many factors in SMP systems, e.g. slower code can make mutex less contentious and thus make things faster). |
AFAIK the binder benchmark is a real-world performance benchmark which returns |
I think it is a real-world benchmark but there are lots of other real-world combinations that are also important but aren't captured by this single benchmark. IOW, I think we can use this benchmark as a data point, but we shouldn't base our decisions on it alone. Android has more extensive benchmarks involving Binder, for example: https://source.android.com/compatibility/vts/performance Once we have enough of it implemented to boot Android, we'll be able to get these to run. Those will be better numbers to base our decisions on. |
Excellent point. How far are you from being able to run at least some of those? |
I've written a tiny microbenchmark. assembly is here: On x86 differences are in
no significant performance difference was found from the benchmark. |
It seems that you didn't use black_box properly and everything has been optimized out. You'll need to give all the functions that returns result an black boxed input and black box the output before giving it to use_result. |
Somehow my phone bugged and closed the PR, sorry |
Can you elaborate on it?? I add |
If you add The inner state you implement is not sufficient. It's almost impossible to use State isn't the only problem though, since even with states, in your case inlining will also mess up the benchmark. You can use |
Thanks! Let me try to use |
Another question Why can't we simply shut it off? |
Normally you would want to benchmark with optimizations enabled, but you can customize profiles in |
Oh, I figured it out! Thanks!
in now it looks reasonable
It seems i32 is the best on my testing machine (aarch64) |
With opt-level=0 you can't do any realistic benchmarks. The compiler won't optimize away any zero-cost abstractions. For example
With these changes I get the following result:
Diff: @@ -1,8 +1,10 @@
-#![feature(test)]
+#![feature(test, bench_black_box)]
-use core::num::{NonZeroI16, NonZeroI32};
extern crate test;
+use std::hint::black_box;
+use std::num::{NonZeroI16, NonZeroI32};
+
#[derive(Clone, Copy, PartialEq, Eq)]
pub struct ErrorNzi32(NonZeroI32);
pub struct ErrorNzi16(NonZeroI16);
@@ -67,13 +69,12 @@
}
}
- fn use_result<F, V, E>(n: i32, f: F) -> (i32, Result<V, E>)
- where
- F: Fn() -> Result<V, E>,
+ #[inline(never)]
+ fn use_result<V, E>(n: i32, f: fn() -> Result<V, E>) -> (i32, Result<V, E>)
{
let mut rt :i32 = 0;
for _ in 0..n {
- rt = match f() {
+ rt += match f() {
Ok(_) => 0,
Err(_) => -1,
};
@@ -84,62 +85,62 @@
#[bench]
fn bench_nzi32_100(b: &mut Bencher) {
- b.iter(|| use_result(100, return_nzi32));
+ b.iter(|| use_result(100, black_box(return_nzi32)));
}
#[bench]
fn bench_nzi32_10000(b: &mut Bencher) {
- b.iter(|| use_result(10000, return_nzi32));
+ b.iter(|| use_result(10000, black_box(return_nzi32)));
}
#[bench]
fn bench_nzi32_1000000(b: &mut Bencher) {
- b.iter(|| use_result(1000000, return_nzi32));
+ b.iter(|| use_result(1000000, black_box(return_nzi32)));
}
#[bench]
fn bench_nzi16_100(b: &mut Bencher) {
- b.iter(|| use_result(100, return_nzi16));
+ b.iter(|| use_result(100, black_box(return_nzi16)));
}
#[bench]
fn bench_nzi16_10000(b: &mut Bencher) {
- b.iter(|| use_result(10000, return_nzi16));
+ b.iter(|| use_result(10000, black_box(return_nzi16)));
}
#[bench]
fn bench_nzi16_1000000(b: &mut Bencher) {
- b.iter(|| use_result(1000000, return_nzi16));
+ b.iter(|| use_result(1000000, black_box(return_nzi16)));
}
#[bench]
fn bench_i32_100(b: &mut Bencher) {
- b.iter(|| use_result(100, return_i32));
+ b.iter(|| use_result(100, black_box(return_i32)));
}
#[bench]
fn bench_i32_10000(b: &mut Bencher) {
- b.iter(|| use_result(10000, return_i32));
+ b.iter(|| use_result(10000, black_box(return_i32)));
}
#[bench]
fn bench_i32_1000000(b: &mut Bencher) {
- b.iter(|| use_result(1000000, return_i32));
+ b.iter(|| use_result(1000000, black_box(return_i32)));
}
#[bench]
fn bench_i16_100(b: &mut Bencher) {
- b.iter(|| use_result(100, return_i16));
+ b.iter(|| use_result(100, black_box(return_i16)));
}
#[bench]
fn bench_i16_10000(b: &mut Bencher) {
- b.iter(|| use_result(10000, return_i16));
+ b.iter(|| use_result(10000, black_box(return_i16)));
}
#[bench]
fn bench_i16_1000000(b: &mut Bencher) {
- b.iter(|| use_result(1000000, return_i16));
+ b.iter(|| use_result(1000000, black_box(return_i16)));
}
} |
Actually I looked at the assembly and |
By Bjorn3: With opt-level=0 you can't do any realistic benchmarks. The compiler won't optimize away any zero-cost abstractions. For example for item in vec.iter() { ... } can be faster than while i < vec.len() { ...; i+=1; } with optimizations due to no bounds checking, but without optimizations it is likely much slower. black_box calls didn't help as you likely passed a function item and not function pointer through the black box. This means that it is still known which function is called based on the type of the function item. Taking fn() -> Result<V, E> instead of F: Fn() -> Result<V, E> fixes this issue. Replacing the rt = match ... with rt += match ... inside use_result is also necessary to prevent optimizing away the match.
Wow, thanks for such a detailed explanation! I've merged this change and run it on my laptop (aarch64 M1)
|
It seems that compiler optimizes away if I % 2 == 0 {
return Ok(());
} for the i16/i32 case to Have you started working on ramfs @foxhlchen? |
Yes, I'm exploring VFS interfaces, then will be carefully figuring how to abstract them. Your opinion?? |
I just think that ramfs might be a good real-life benchmark for this PR ;) |
Ah, I'm suspicious. Even though Result is used basically everywhere, but it takes a fraction of time - I still think a microbenchmark will be better. If it proves nzi32 is better, than we put it in a real life benchmark only to see if there is a significant regression. |
With latest upstream llvm18, the following test cases failed: $ ./test_progs -j #13/2 bpf_cookie/multi_kprobe_link_api:FAIL #13/3 bpf_cookie/multi_kprobe_attach_api:FAIL #13 bpf_cookie:FAIL #77 fentry_fexit:FAIL #78/1 fentry_test/fentry:FAIL #78 fentry_test:FAIL #82/1 fexit_test/fexit:FAIL #82 fexit_test:FAIL #112/1 kprobe_multi_test/skel_api:FAIL #112/2 kprobe_multi_test/link_api_addrs:FAIL [...] #112 kprobe_multi_test:FAIL #356/17 test_global_funcs/global_func17:FAIL #356 test_global_funcs:FAIL Further analysis shows llvm upstream patch [1] is responsible for the above failures. For example, for function bpf_fentry_test7() in net/bpf/test_run.c, without [1], the asm code is: 0000000000000400 <bpf_fentry_test7>: 400: f3 0f 1e fa endbr64 404: e8 00 00 00 00 callq 0x409 <bpf_fentry_test7+0x9> 409: 48 89 f8 movq %rdi, %rax 40c: c3 retq 40d: 0f 1f 00 nopl (%rax) ... and with [1], the asm code is: 0000000000005d20 <bpf_fentry_test7.specialized.1>: 5d20: e8 00 00 00 00 callq 0x5d25 <bpf_fentry_test7.specialized.1+0x5> 5d25: c3 retq ... and <bpf_fentry_test7.specialized.1> is called instead of <bpf_fentry_test7> and this caused test failures for #13/#77 etc. except #356. For test case #356/17, with [1] (progs/test_global_func17.c)), the main prog looks like: 0000000000000000 <global_func17>: 0: b4 00 00 00 2a 00 00 00 w0 = 0x2a 1: 95 00 00 00 00 00 00 00 exit ... which passed verification while the test itself expects a verification failure. Let us add 'barrier_var' style asm code in both places to prevent function specialization which caused selftests failure. [1] llvm/llvm-project#72903 Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
Use NonZeroI32 as the inner type. This allows Result<()> to fit in 32-bit, instead of 64-bit.
V1->V2
NonZeroI32
instead ofNonZeroI16
kernel_const_to_error
as a const functionfrom_kernel_errno
to avoidunwrap()
V2->V3
kernel_const_to_error()
back to a macro and addstatic_assertion!
to it.Error(unsafe { NonZeroI16::new_unchecked(errno as i16) })
infrom_kernel_error()
to make it terserV3->V4
V4->V5
kernel_const_to_error()
to a const function again. 😮💨Related commit:
#296
May influence:
#358