Boost iterator intersperse(_with) performance #111379

nyurik · 2023-05-09T05:34:31Z

I did some benchmark digging into the intersperse and intersperse_with code as part of this discussion, and as a result I optimized them a bit, without relying on the peekable iterator.

See also full benchmark repo

Benchmarks show near 2x performance improvements with the simple sum benchmarks:

rustbot · 2023-05-09T05:34:37Z

r? @cuviper

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-05-09T05:34:39Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

library/core/src/iter/adapters/intersperse.rs

nyurik · 2023-05-09T17:10:46Z

Per @m-ou-se suggestion, I implemented an additional bool that tracks if the iteration has started or not, and that resulted in a significant performance degradation. Now the gains are much more modest: tests with 222 in the graph below are only a fraction of the total possible perf improvements.

I guess the main question is if it is ok for a wrapping iterator to call nested_iterator.next() on construction rather than on the first call to next() in order to gain a significant performance boost? Or should the almost never used edge cases warrant a significant perf degradation for everyone else?

In this graph, results of the benchmarks -- first line is the current implementation (000), vs optimized (111 - calls next on construction), vs delayed (222 - calls next on first call to next)

cuviper · 2023-05-09T17:20:57Z

I guess the main question is if it is ok for a wrapping iterator to call nested_iterator.next() on construction rather than on the first call to next() in order to gain a significant performance boost? Or should the almost never used edge cases warrant a significant perf degradation for everyone else?

I think a third-party crate could make the tradeoff to have that first call right away, and clearly explain that to the user, but Iterator laziness is an important principle in the standard library.

nyurik · 2023-05-09T22:35:33Z

Ok, a smaller perf improvement is still better than no improvement. Adjusted

cuviper · 2024-01-25T20:03:45Z

Sorry for my absence -- I'm digging this out of my review backlog now.

On my AMD Ryzen 7 5800X with the latest nightly, the "222" implementation actually looks great -- beating "111" on everything except opt-unwrap and with-opt-unwrap, and still much better than "000". Maybe LLVM got better at optimizing that branch since you ran it? Or it could just be CPU differences.

(I left the fancy red grouping outlines as an exercise for the reader... 🙂 )

nyurik · 2024-01-25T20:55:07Z

thx @cuviper, I just re-ran and also get similar results. So it seems the 222 path is an ok as it clearly gains in several cases, and remains consistent with the other ones. Are there any objections to implement this PR as the 222 variant?

I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of the https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13 discussion, and as a result I optimized them a bit, without relying on the peekable iterator.

cuviper · 2024-01-25T20:32:38Z

library/core/src/iter/adapters/intersperse.rs

+#[unstable(feature = "iter_intersperse", reason = "recently added", issue = "79524")]
+impl<I> FusedIterator for Intersperse<I>
+where
+    I: FusedIterator,


Just to note -- with iter: Fuse<I>, we shouldn't need this constraint, but it's conservative to keep it.

thx, so i left it as is (unless it gets in the way of anything?)

library/core/src/iter/adapters/intersperse.rs

cuviper · 2024-01-25T20:55:30Z

library/core/src/iter/adapters/intersperse.rs

+        hi.map(|hi| {
+            hi.saturating_sub(!started as usize)
+                .saturating_add(next_is_some as usize)
+                .saturating_add(hi)


Why is this saturating instead of and_then-checked_add as before? It should return None when the upper bound is greater than usize::MAX.

I wonder if there should be more tests around hints - esp when working with iters that have underlying iters... seems like this is something that can relatively easily can be messed up

Maybe so -- we'd certainly like for the standard library to get it right. But Iterator::size_hint has limited utility anyway, given that it can be implemented incorrectly by safe code for custom iterators.

Related discussion: https://internals.rust-lang.org/t/is-size-hint-1-ever-used/8187

Co-authored-by: Josh Stone <[email protected]>

cuviper · 2024-01-26T19:33:29Z

Thanks!

@bors r+

bors · 2024-01-26T19:33:32Z

📌 Commit 77f31ef has been approved by cuviper

It is now in the queue for this repository.

…iper Boost iterator intersperse(_with) performance I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator. See also [full benchmark repo](https://github.com/nyurik/intersperse_perf) Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2): ![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)

Rollup of 9 pull requests Successful merges: - rust-lang#111379 (Boost iterator intersperse(_with) performance) - rust-lang#118182 (Properly recover from trailing attr in body) - rust-lang#119641 (Remove feature not required by `Ipv6Addr::to_cononical` doctest) - rust-lang#119759 (Add FileCheck annotations to dataflow-const-prop tests) - rust-lang#120275 (Avoid ICE in trait without `dyn` lint) - rust-lang#120376 (Update codegen test for LLVM 18) - rust-lang#120386 (ScopeTree: remove destruction_scopes as unused) - rust-lang#120398 (Improve handling of numbers in `IntoDiagnosticArg`) - rust-lang#120399 (Remove myself from review rotation) r? `@ghost` `@rustbot` modify labels: rollup

…iper Boost iterator intersperse(_with) performance I did some benchmark digging into the `intersperse` and `intersperse_with` code as part of [this discussion](https://internals.rust-lang.org/t/add-iterate-with-separators-iterator-function/18781/13), and as a result I optimized them a bit, without relying on the peekable iterator. See also [full benchmark repo](https://github.com/nyurik/intersperse_perf) Benchmarks show near 2x performance improvements with the simple `sum` [benchmarks](https://gist.github.com/nyurik/68b6c9b3d90f0d14746d4186bf8fa1e2): ![image](https://user-images.githubusercontent.com/1641515/237005195-16aebef4-9eed-4514-8b7c-da1d1f5bd9e0.png)

Rollup of 9 pull requests Successful merges: - rust-lang#111379 (Boost iterator intersperse(_with) performance) - rust-lang#118182 (Properly recover from trailing attr in body) - rust-lang#119641 (Remove feature not required by `Ipv6Addr::to_cononical` doctest) - rust-lang#119957 (fix: correct suggestion arg for impl trait) - rust-lang#120275 (Avoid ICE in trait without `dyn` lint) - rust-lang#120376 (Update codegen test for LLVM 18) - rust-lang#120386 (ScopeTree: remove destruction_scopes as unused) - rust-lang#120398 (Improve handling of numbers in `IntoDiagnosticArg`) - rust-lang#120399 (Remove myself from review rotation) r? `@ghost` `@rustbot` modify labels: rollup

bors · 2024-01-27T08:16:52Z

⌛ Testing commit 77f31ef with merge 8b6a431...

bors · 2024-01-27T10:13:30Z

☀️ Test successful - checks-actions
Approved by: cuviper
Pushing 8b6a431 to master...

rust-timer · 2024-01-27T11:28:58Z

Finished benchmarking commit (8b6a431): comparison URL.

Overall result: ❌ regressions - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.8%	[0.8%, 0.8%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.2%	[1.2%, 1.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 660.806s -> 663.547s (0.41%)
Artifact size: 308.11 MiB -> 308.14 MiB (0.01%)

rustbot assigned cuviper May 9, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 9, 2023

Noratrieb reviewed May 9, 2023

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

nyurik force-pushed the intersperse-speed-up branch from 7644bdb to f763004 Compare May 9, 2023 07:10

This comment has been minimized.

Sign in to view

nyurik force-pushed the intersperse-speed-up branch from f763004 to 48ab2a3 Compare May 9, 2023 07:59

m-ou-se reviewed May 9, 2023

View reviewed changes

library/core/src/iter/adapters/intersperse.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

nyurik added 3 commits January 25, 2024 15:56

Postpone .next() call until iteration

b8d245e

fmt

f1dbc7b

nyurik force-pushed the intersperse-speed-up branch from e9e72a5 to f1dbc7b Compare January 25, 2024 20:57

cuviper reviewed Jan 25, 2024

View reviewed changes

nyurik and others added 2 commits January 25, 2024 20:33

Update library/core/src/iter/adapters/intersperse.rs

8cbff0b

Co-authored-by: Josh Stone <[email protected]>

use checked_add for upper bound

77f31ef

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 26, 2024

Nadrieril mentioned this pull request Jan 27, 2024

Rollup of 11 pull requests #120408

Closed

Nadrieril mentioned this pull request Jan 27, 2024

Rollup of 9 pull requests #120409

Closed

Nadrieril mentioned this pull request Jan 27, 2024

Rollup of 9 pull requests #120412

Closed

bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 27, 2024

bors merged commit 8b6a431 into rust-lang:master Jan 27, 2024
12 checks passed

rustbot added this to the 1.77.0 milestone Jan 27, 2024

nyurik deleted the intersperse-speed-up branch February 12, 2024 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Boost iterator intersperse(_with) performance #111379

Boost iterator intersperse(_with) performance #111379

nyurik commented May 9, 2023 •

edited

Loading

rustbot commented May 9, 2023

rustbot commented May 9, 2023

This comment has been minimized.

This comment has been minimized.

nyurik commented May 9, 2023 •

edited

Loading

cuviper commented May 9, 2023

nyurik commented May 9, 2023

This comment has been minimized.

cuviper commented Jan 25, 2024

nyurik commented Jan 25, 2024

cuviper Jan 25, 2024

nyurik Jan 26, 2024

cuviper Jan 25, 2024

nyurik Jan 26, 2024

nyurik Jan 26, 2024

cuviper Jan 26, 2024

cuviper commented Jan 26, 2024

bors commented Jan 26, 2024

bors commented Jan 27, 2024

bors commented Jan 27, 2024

rust-timer commented Jan 27, 2024

Boost iterator intersperse(_with) performance #111379

Boost iterator intersperse(_with) performance #111379

Conversation

nyurik commented May 9, 2023 • edited Loading

rustbot commented May 9, 2023

rustbot commented May 9, 2023

This comment has been minimized.

This comment has been minimized.

nyurik commented May 9, 2023 • edited Loading

cuviper commented May 9, 2023

nyurik commented May 9, 2023

This comment has been minimized.

cuviper commented Jan 25, 2024

nyurik commented Jan 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cuviper commented Jan 26, 2024

bors commented Jan 26, 2024

bors commented Jan 27, 2024

bors commented Jan 27, 2024

rust-timer commented Jan 27, 2024

Overall result: ❌ regressions - no action needed

nyurik commented May 9, 2023 •

edited

Loading

nyurik commented May 9, 2023 •

edited

Loading