Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #74024 - Folyd:master, r=m-ou-se
Improve slice.binary_search_by()'s best-case performance to O(1) This PR aimed to improve the [slice.binary_search_by()](https://doc.rust-lang.org/std/primitive.slice.html#method.binary_search_by)'s best-case performance to O(1). # Noticed I don't know why the docs of `binary_search_by` said `"If there are multiple matches, then any one of the matches could be returned."`, but the implementation isn't the same thing. Actually, it returns the **last one** if multiple matches found. Then we got two options: ## If returns the last one is the correct or desired result Then I can rectify the docs and revert my changes. ## If the docs are correct or desired result Then my changes can be merged after fully reviewed. However, if my PR gets merged, another issue raised: this could be a **breaking change** since if multiple matches found, the returning order no longer the last one instead of it could be any one. For example: ```rust let mut s = vec![0, 1, 1, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55]; let num = 1; let idx = s.binary_search(&num); s.insert(idx, 2); // Old implementations assert_eq!(s, [0, 1, 1, 1, 1, 2, 2, 3, 5, 8, 13, 21, 34, 42, 55]); // New implementations assert_eq!(s, [0, 1, 1, 1, 2, 1, 2, 3, 5, 8, 13, 21, 34, 42, 55]); ``` # Benchmarking **Old implementations** ```sh $ ./x.py bench --stage 1 library/libcore test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 4) test slice::binary_search_l1_with_dups ... bench: 59 ns/iter (+/- 3) test slice::binary_search_l2 ... bench: 76 ns/iter (+/- 5) test slice::binary_search_l2_with_dups ... bench: 77 ns/iter (+/- 17) test slice::binary_search_l3 ... bench: 183 ns/iter (+/- 23) test slice::binary_search_l3_with_dups ... bench: 185 ns/iter (+/- 19) ``` **New implementations (1)** Implemented by this PR. ```rust if cmp == Equal { return Ok(mid); } else if cmp == Less { base = mid } ``` ```sh $ ./x.py bench --stage 1 library/libcore test slice::binary_search_l1 ... bench: 58 ns/iter (+/- 2) test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 4) test slice::binary_search_l2 ... bench: 76 ns/iter (+/- 3) test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 6) test slice::binary_search_l3 ... bench: 200 ns/iter (+/- 30) test slice::binary_search_l3_with_dups ... bench: 157 ns/iter (+/- 6) $ ./x.py bench --stage 1 library/libcore test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 8) test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 2) test slice::binary_search_l2 ... bench: 77 ns/iter (+/- 2) test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 2) test slice::binary_search_l3 ... bench: 198 ns/iter (+/- 21) test slice::binary_search_l3_with_dups ... bench: 158 ns/iter (+/- 11) ``` **New implementations (2)** Suggested by `@nbdd0121` in [comment](#74024 (comment)). ```rust base = if cmp == Greater { base } else { mid }; if cmp == Equal { break } ``` ```sh $ ./x.py bench --stage 1 library/libcore test slice::binary_search_l1 ... bench: 59 ns/iter (+/- 7) test slice::binary_search_l1_with_dups ... bench: 37 ns/iter (+/- 5) test slice::binary_search_l2 ... bench: 75 ns/iter (+/- 3) test slice::binary_search_l2_with_dups ... bench: 56 ns/iter (+/- 3) test slice::binary_search_l3 ... bench: 195 ns/iter (+/- 15) test slice::binary_search_l3_with_dups ... bench: 151 ns/iter (+/- 7) $ ./x.py bench --stage 1 library/libcore test slice::binary_search_l1 ... bench: 57 ns/iter (+/- 2) test slice::binary_search_l1_with_dups ... bench: 38 ns/iter (+/- 2) test slice::binary_search_l2 ... bench: 77 ns/iter (+/- 11) test slice::binary_search_l2_with_dups ... bench: 57 ns/iter (+/- 4) test slice::binary_search_l3 ... bench: 194 ns/iter (+/- 15) test slice::binary_search_l3_with_dups ... bench: 151 ns/iter (+/- 18) ``` I run some benchmarking testings against on two implementations. The new implementation has a lot of improvement in duplicates cases, while in `binary_search_l3` case, it's a little bit slower than the old one.
- Loading branch information