-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify core::hint::spin_loop
#115547
Simplify core::hint::spin_loop
#115547
Conversation
Failed to set assignee to
|
r? libs |
{ | ||
// SAFETY: the `cfg` attr ensures that we only execute this on arm targets | ||
// with support for the v6 feature. | ||
unsafe { crate::arch::arm::__yield() }; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a side-note, I've noticed that hexagon
target has a pause
instruction tailored for spin loops, but asm!
macro doesn't seem to support hexagon
yet, so I'm not sure if we can do anything here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was told that we actually do support hexagon
ASM. It's just unstable and not documented (besides the tracking issue: #93335). I could try adding a pause
here for it then.
{ | ||
// SAFETY: the `cfg` attr ensures that we only execute this on arm targets | ||
// with support for the v6 feature. | ||
unsafe { crate::arch::arm::__yield() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This answer on SO seems to suggest that __isb(SY)
is better than __yield()
(for some definition of "better" I guess). It seems like both of those are available for both aarch64 and arm. Should we use __isb(SY)
for arm too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SO answer is mistaken. At least, somewhat mistaken. ISB, the "Instruction Synchronization Barrier," flushes the instruction pipeline on the calling processor. It isn't explicitly for use in something as general-purpose as a spin-loop:
Instruction Synchronization Barrier flushes the pipeline in the PE and is a context synchronization event.
When used with the SY option it can cause a "full system barrier operation," which could only be more expensive than plain ISB.
I worry that __isb(SY)
may only appear to improve performance, and only on select implementations.
The instruction that best fits the intent of a spin-loop hint is YIELD:
YIELD is a hint instruction. Software with a multithreading capability can use a YIELD instruction to indicate to the PE that it is performing a task, for example a spin-lock, that could be swapped out to improve overall system performance. The PE can use this hint to suspend and resume multiple software threads if it supports the capability.
Expanding the use of __isb(SY)
from aarch64 to both targets seems misguided. If anything, the change from __yield()
to __isb(SY)
on aarch64 should be reverted.
// SAFETY: the `cfg` attr ensures that we only execute this on arm targets | ||
// with support for the v6 feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure those (or any of the above) comments make sense. As far as I can tell, all the functions in core::arch::...
are cfg-ed anyway, so we would not be able to call them on a wrong arch. Should I delete them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functions in core::arch::...
can be called just fine if the relevant cfg(target_feature = "...")
is not true at compile time. In that case you need to do runtime detection to check if it is supported before calling however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how this could be true.
__yield
is defined in arm_shared
. arm_shared
is marked as #[cfg(any(target_arch = "arm", target_arch = "aarch64", doc))]
. __yield
itself is marked as #[cfg(any(target_feature = "v6", target_arch = "aarch64", doc))]
.
So this results in __yield
being available if cfg(all(any(target_arch = "arm", target_arch = "aarch64", doc), any(target_feature = "v6", target_arch = "aarch64", doc)))
. doc
and target_arch = "aarch64"
are not true in this context, so we are left with cfg(all(any(target_arch = "arm"), any(target_feature = "v6")))
, which can be simplified to cfg(all(target_arch = "arm", target_feature = "v6"))
.
Unless I'm missing something, this is exactly the same as the cfg
on this block?...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you are right in this case. For x86 intrinsics this is definitively not the case though. Those are never cfg
ed on target features.
@bors r+ rollup=iffy I think this seems fine -- there's some comments here on possible changes to the code, but I don't think we need to do that in this PR. |
☀️ Test successful - checks-actions |
Finished benchmarking commit (8a7cab8): comparison URL. Overall result: ✅ improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 634.064s -> 634.575s (0.08%) |
The grouping was inconsistent and not really helpful.
r? t-libs