-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: eachsplit
for iterative splitting
#39245
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with minor comments
9132a21
to
bacefa6
Compare
Thanks for reviewing! I'm able to achieve 0 allocations with iteration using a loop, but there's still a noticeable overhead compared to the previous I've added another commit to use Latest master: julia> @benchmark split("α β γ", " ")
BenchmarkTools.Trial:
memory estimate: 192 bytes
allocs estimate: 2
--------------
minimum time: 236.418 ns (0.00% GC)
median time: 240.995 ns (0.00% GC)
mean time: 257.252 ns (2.55% GC)
maximum time: 4.492 μs (94.14% GC)
--------------
samples: 10000
evals/sample: 421
julia> @btime sum(length, split($"The quick brown fox jumps over the lazy dog."))
878.023 ns (4 allocations: 800 bytes)
36 This PR: julia> @benchmark split("α β γ", " ")
BenchmarkTools.Trial:
memory estimate: 192 bytes
allocs estimate: 2
--------------
minimum time: 255.710 ns (0.00% GC)
median time: 269.169 ns (0.00% GC)
mean time: 283.707 ns (2.81% GC)
maximum time: 6.288 μs (94.93% GC)
--------------
samples: 10000
evals/sample: 355
julia> @btime sum(length, split($"The quick brown fox jumps over the lazy dog."))
1.052 μs (13 allocations: 1.34 KiB)
36
julia> @btime sum(length, split("The quick brown fox jumps over the lazy dog."; keepempty=true))
967.529 ns (4 allocations: 800 bytes)
36
julia> @btime sum(length, eachsplit($"The quick brown fox jumps over the lazy dog."))
752.438 ns (0 allocations: 0 bytes)
36 |
bacefa6
to
94e3e09
Compare
4b2d6c9
to
3a4d4b1
Compare
I couldn't figure out why Current master: julia> @benchmark split("α β γ", " ")
BenchmarkTools.Trial:
memory estimate: 192 bytes
allocs estimate: 2
--------------
minimum time: 268.547 ns (0.00% GC)
median time: 275.657 ns (0.00% GC)
mean time: 291.221 ns (2.48% GC)
maximum time: 6.723 μs (95.40% GC)
--------------
samples: 10000
evals/sample: 318
julia> @btime sum(length, split($"The quick brown fox jumps over the lazy dog."))
908.389 ns (4 allocations: 800 bytes)
36 This PR: julia> @benchmark split("α β γ", " ")
BenchmarkTools.Trial:
memory estimate: 192 bytes
allocs estimate: 2
--------------
minimum time: 243.090 ns (0.00% GC)
median time: 247.575 ns (0.00% GC)
mean time: 263.489 ns (2.89% GC)
maximum time: 5.694 μs (94.25% GC)
--------------
samples: 10000
evals/sample: 398
julia> @btime sum(length, split($"The quick brown fox jumps over the lazy dog."))
887.864 ns (4 allocations: 800 bytes)
36
julia> @btime sum(length, eachsplit($"The quick brown fox jumps over the lazy dog."))
627.118 ns (0 allocations: 0 bytes)
36 |
The "needs news" label is inaccurate now. This looks good to go? Maybe triage can make a decision on it and report here? |
This commit moves the existing splitting implementation into an iterator named `eachsplit` and changes the definition of `split(...)` to `collect(eachsplit(...))`, plus a few edge cases.
c9fe96d
to
3f1801d
Compare
Rebased again. I'm not completely happy about the new Current master: julia> @benchmark split("α β γ", " ")
BenchmarkTools.Trial: 10000 samples with 405 evaluations.
Range (min … max): 240.978 ns … 5.592 μs ┊ GC (min … max): 0.00% … 93.55%
Time (median): 264.570 ns ┊ GC (median): 0.00%
Time (mean ± σ): 292.964 ns ± 199.246 ns ┊ GC (mean ± σ): 3.05% ± 4.45%
▂▄██▇▆▃▃▂▂▂▃▂▂▁ ▂
▇████████████████▇▇▇▆▆▆▆▆▅▅▆▅▅▄▅▆▅▆▆▆▆▆▇▇▆▇▆▇▇▆▇▆▇▇▇▇▇▆▆▆▆▆▆▇ █
241 ns Histogram: log(frequency) by time 554 ns <
Memory estimate: 272 bytes, allocs estimate: 2.
julia> @btime sum(length, split($"The quick brown fox jumps over the lazy dog."))
958.864 ns (3 allocations: 1.25 KiB)
36 This PR: julia> @benchmark split("α β γ", " ")
BenchmarkTools.Trial: 10000 samples with 340 evaluations.
Range (min … max): 258.824 ns … 8.959 μs ┊ GC (min … max): 0.00% … 95.93%
Time (median): 289.379 ns ┊ GC (median): 0.00%
Time (mean ± σ): 316.502 ns ± 276.307 ns ┊ GC (mean ± σ): 3.61% ± 4.03%
▁ ▄▃▇▂█▃▃▃▃▂▂▃▂▂▂ ▁
█▆██████████████████▇▆▇▇▇▆▆▄▅▆▆▅▆▆▆▅▅▅▆▆▆▆▇▆▆▇▇▆▆▆▆▇▆▇▇▆▇▆▇▆▆ █
259 ns Histogram: log(frequency) by time 557 ns <
Memory estimate: 272 bytes, allocs estimate: 2.
julia> @btime sum(length, split($"The quick brown fox jumps over the lazy dog."))
981.739 ns (3 allocations: 1.25 KiB)
36
julia> @btime sum(length, eachsplit($"The quick brown fox jumps over the lazy dog."))
721.389 ns (0 allocations: 0 bytes)
36 |
also needs test? |
This PR implements |
Would it make sense to put this into Also, it would be good to add a compat admonition to the docstring (“This function requires at least Julia 1.8”). |
Somehow type constraints from the complex `while` condition don't propagate to the `while` body.
Might be nice to have reverse-iteration support, i.e. to implement (That would also give you e.g. |
This moves the existing splitting implementation into an iterator named `eachsplit` and changes the definition of `split(...)` to `collect(eachsplit(...))`, plus a few edge cases.
This moves the existing splitting implementation into an iterator named `eachsplit` and changes the definition of `split(...)` to `collect(eachsplit(...))`, plus a few edge cases.
Inspired by a question on Zulip, I'll ask, why is this function called |
because there are a few
|
Fixes #20603, replacing/closes #20688, closes #7027.
We might also want an
eachrsplit
to matchrsplit
.Initial benchmarks show this being more expensive than the existing
split
. Advice on how to improve this would be appreciated.Existing implementation:
This PR: