WIP: ~15% Faster str to int parsing (when base 10) #83088

gilescope · 2021-03-13T17:15:41Z

I saw a c++ talk on numbers and wondered if the same trick would work here.

Rather than multiplying the same number each time, if we're multiplying different numbers then the ALU can reorder them (as they have fewer data dependencies and add is commutative) and do more of them in parallel.

If radix is called with a constant (which it is 99% of the time) I'm pretty sure that llvm evaluates the if radix == 10 at compile time - I noticed that when radix == 16 llvm can figure out that it can shift rather than multiply! Pretty impressive in my book that it can spot that!

Speedup wise I'm seeing it go from 1.4ns to 1.2ns. Not earth shattering, but not to be sniffed at either.

… here. Rather than multiplying the same number each time, if we're multiplying different numbers then the ALU can reorder them (as they have fewer data dependencies) and do more of them in parallel. If radix is called with a constant (which it is 99% of the time) I'm pretty sure that llvm evaluates the if at compile time as to which algorythm to pick. I noticed that when radix == 16 it can figure out that it can shift rather than multiply which is pretty impressive in my book that it can spot that! Speedup wise I'm seeing it go from 1.4ns to 1.2ns. Not earth shattering, but not to be sniffed at either.

rust-highfive · 2021-03-13T17:15:44Z

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

gilescope · 2021-03-15T08:04:33Z

If it's too specific being constrained to u32 sized and below for the std lib I understand. I can always do this as a separate crate.

DutchGhost · 2021-03-15T08:47:32Z

This comes from a talk from Andrei Alexandrescu, fastware (timestamp important)

In the past I played around with it, and implemented this parsing technique for each integer type with macros as seen here.

and even made a const fn version of it: https://gist.github.com/DutchGhost/d8604a3c796479777fe9f5e25d855cfd

I dont remember my bench results exactly, but the speed improvement was not all that much, but maybe that was just my implementation. It's a neat parsing technique :)

gilescope · 2021-03-15T10:10:56Z

Thank you! I couldn’t refind the talk anywhere. It’s a great video to watch- eye opening!

…

On Mon, 15 Mar 2021 at 08:47, DutchGhost ***@***.***> wrote: This comes from a talk from Andrei Alexandrescu: https://youtu.be/o4-CwDo2zpg?t=2346 (timestamp important) In the past I played around with it, and implemented this parsing technique for each integer type with macros: https://github.com/DutchGhost/byte-num/blob/a72222fd7927d3bf5d203389c360ccb1b874ca41/src/from_ascii.rs#L72, and even made a const fn version of it: https://gist.github.com/DutchGhost/d8604a3c796479777fe9f5e25d855cfd I dont remember my bench results exactly, but the speed improvement was not all that much, but maybe that was just my implementation. It's a neat parsing technique :) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#83088 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGEJCEB4HUIAQKJPW263JDTDXCTPANCNFSM4ZEAKXMQ> .

joshtriplett · 2021-03-18T20:46:56Z

This seems like a reasonable optimization, if it ends up providing an improvement, and passes all tests.

Could you please address the test failures, rename the multiplier array to something like DECIMAL_MULTIPLIERS, and expand it to work up to u64?

rust-log-analyzer · 2021-03-20T09:21:53Z

The job x86_64-gnu-llvm-9 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

Suite("src/test/assembly") not skipped for "bootstrap::test::Assembly" -- not in ["src/tools/tidy"]
Check compiletest suite=assembly mode=assembly (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)

running 31 tests
iiiiiiiiiiiiiiiiiiiiiiiiiiiii..
Suite("src/test/incremental") not skipped for "bootstrap::test::Incremental" -- not in ["src/tools/tidy"]

 finished in 0.103 seconds
Check compiletest suite=incremental mode=incremental (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
---
Suite("src/test/debuginfo") not skipped for "bootstrap::test::Debuginfo" -- not in ["src/tools/tidy"]
Check compiletest suite=debuginfo mode=debuginfo (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)

running 116 tests
iiiiiiiiii.i.i...i.i..ii....i.i....ii..........iiii.........i.....i...i.......ii.i.ii......iiii....i 100/116
test result: ok. 78 passed; 0 failed; 38 ignored; 0 measured; 0 filtered out; finished in 2.22s

 finished in 2.286 seconds
Suite("src/test/ui-fulldeps") not skipped for "bootstrap::test::UiFullDeps" -- not in ["src/tools/tidy"]
---
.................................................................................................... 1200/1248
................................................
failures:

---- num::from_str_issue7588 stdout ----
thread 'num::from_str_issue7588' panicked at 'assertion failed: `(left == right)`
  left: `Some(232)`,
 right: `None`', library/core/tests/num/mod.rs:87:5
---- num::test_invalid stdout ----
---- num::test_invalid stdout ----
thread 'num::test_invalid' panicked at 'assertion failed: `(left == right)`
  left: `Err(PosOverflow)`,
 right: `Err(InvalidDigit)`', library/core/tests/num/mod.rs:81:5

failures:
error: test failed, to rerun pass '-p core --test coretests'
    num::from_str_issue7588
    num::from_str_issue7588
    num::test_invalid

test result: FAILED. 1244 passed; 2 failed; 2 ignored; 0 measured; 0 filtered out; finished in 5.51s



command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "test" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "16" "--release" "--locked" "--color" "always" "--features" "panic-unwind backtrace compiler-builtins-c" "--manifest-path" "/checkout/library/test/Cargo.toml" "-p" "core" "--" "--quiet"


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap --stage 2 test --exclude src/tools/tidy
Build completed unsuccessfully in 0:16:47

gilescope · 2021-03-22T08:50:37Z

I think the other PR has more effect and is broader in scope than this PR.

pickfire · 2021-03-22T16:30:44Z

library/core/src/num/mod.rs

+        if is_positive {
+            // The number is positive
+            for &c in digits {
+                let x = match (c as char).to_digit(radix) {
+                    Some(x) => x,
+                    None => return Err(PIE { kind: InvalidDigit }),
+                };
+                let x = match factors[idx].checked_mul(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: PosOverflow }),
+                };
+                result = match result.checked_add(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: PosOverflow }),
+                };
+                idx += 1;
+            }
+        } else {
+            // The number is negative
+            for &c in digits {
+                let x = match (c as char).to_digit(radix) {
+                    Some(x) => x,
+                    None => return Err(PIE { kind: InvalidDigit }),
+                };
+                let x = match factors[idx].checked_mul(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: NegOverflow }),
+                };
+                result = match result.checked_sub(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: NegOverflow }),
+                };
+                idx += 1;
+            }


Is it necessary to have so many duplicated code here? Also, why not put the extra condition branch in the error flow rather the default flow?

Maybe we can use iterator for idx here?

Suggested change

if is_positive {

// The number is positive

for &c in digits {

let x = match (c as char).to_digit(radix) {

Some(x) => x,

None => return Err(PIE { kind: InvalidDigit }),

};

let x = match factors[idx].checked_mul(x) {

Some(result) => result,

None => return Err(PIE { kind: PosOverflow }),

};

result = match result.checked_add(x) {

Some(result) => result,

None => return Err(PIE { kind: PosOverflow }),

};

idx += 1;

}

} else {

// The number is negative

for &c in digits {

let x = match (c as char).to_digit(radix) {

Some(x) => x,

None => return Err(PIE { kind: InvalidDigit }),

};

let x = match factors[idx].checked_mul(x) {

Some(result) => result,

None => return Err(PIE { kind: NegOverflow }),

};

result = match result.checked_sub(x) {

Some(result) => result,

None => return Err(PIE { kind: NegOverflow }),

};

idx += 1;

}

for &c in digits {

let x = match (c as char).to_digit(radix) {

Some(x) => x,

None => return Err(PIE { kind: InvalidDigit }),

};

let x = match factors[idx].checked_mul(x) {

Some(result) => result,

None => return Err(PIE { kind: if is_positive { PosOverflow } else { NegOverflow } }),

};

result = match result.checked_add(x) {

Some(result) => result,

None => return Err(PIE { kind: if is_positive { PosOverflow } else { NegOverflow } }),

};

idx += 1;

I think you missed there was a checked_sub in there? Having an if in the loop to choose between checked_add and checked_sub kills performance - I can attest to that! I did wonder if we could do better too. You can't add and then negate as the negative ranges are slightly bigger. So you could always subtract and then !+1 to make positive, which would work great for signed types (with a check to see if val == T::Min) but I couldn't get my head around what would happen if I was doing subtractions to unsigned types - if I have to do unsigned types a different way then I've not made anything any simpler. So I can't think of a way to simplify the code, but am willing to try anything if you have any other ideas.

Ah, I did missed that.

But what if we put the if branch only for checked_add and checked_sub but keep the to_digit and checked_mul?

Another idea, can you also try using if it's negative after the for loop to do checked_neg there? Remove the for loop and add a branch to convert the positive number to negative using checked_neg and return NegOverflow if there are issues so we don't need another loop for negative, a single branch.

While trying that, maybe can also try a hand crafted negation instead of checked_add? Like this

for &c in digits { ... // checked_add ... } if !is_positive { const UNSIGNED_BITS: i8 = (i8::MAX as u8 * 2) as i8 + 1; // don't just use i8, can use whatever type it is result = (i ^ UNSIGNED_BITS).checked_add(1); }

One more thing, I don't think we need the whole bunch of negative calculation in unsigned code do we?

If would be better if you have a repo to try out stuff on the benchmarking, like the one I did https://github.com/pickfire/parseint so that I can also try to help out.

For i8 if we're parsing "-128" we run into representation trouble if we start positive and then negate, that's why one would want to do subtractions and then neg. But we'd need a different code path for unsigned...

I don't think we need the whole bunch of negative calculation in unsigned code do we?

Hmm, we could do if is_positive || !is_signed_ty { rather than if is_positive - that way the compiler should remove the else block for unsigned types (I don't think currently the compiler can realise that is_positive is always true for unsigned types, but it's pretty crafty so it might have figured that out anyhows).

I'll fork your parseint and add some benchmarks. (The trick is very cool you've got there btw.)

I need to double check my benchmarks anyhow because I reaslised that my blackbox was not quite in the right place and the compiler was optimising for the data (rookie mistake!).

Could you link your godbolt experiments?

The code in the godbolt is an exploration of expanding to larger type sizes than u32 but it's slow due to an if in the loop. Nevertheless it demostrates the is_signed_ty evaluation nicely.

Sure, on the left it's a const that evaluates to false and on the right it's an expression. https://godbolt.org/z/od3ajTh3v
Interestingly on the left side if I replace
const is_signed_ty: bool = false; with
let is_signed_ty: bool = false;
it can't tell that it should drop the else expression which is surprising. I would have thought it should have proporgated the constant into the is_signed_ty given it was assigned by a constant and is not mutable.

That godbolt link shows unoptimized assembly. You forgot to set -Copt-level=3. With that the assembly is identical between the two.

@pickfire I took a stab at building in all the safeguards to your trick as an alternative route for radix 10. Have submitted a PR for us to talk around: pickfire/parseint#2

LingMan · 2021-03-28T14:30:54Z

library/core/src/num/mod.rs

+                let x = match (c as char).to_digit(radix) {
+                    Some(x) => x,
+                    None => return Err(PIE { kind: InvalidDigit }),
+                };


These matches can all be written a bit shorter like this:

Suggested change

let x = match (c as char).to_digit(radix) {

Some(x) => x,

None => return Err(PIE { kind: InvalidDigit }),

};

let x = (c as char).to_digit(radix).ok_or(PIE { kind: InvalidDigit })?;

I recall we used to do these but in performance sensitive parts using ok_or is slower than match counterparts, from what I recall.

Really? The optimizer should have all information it needs and in my quick tests the generated assembly is identical. Is there an open issue or do you have a test case?

No, that's what I recall but not sure if the same goes for this case, if the assembly is the same then I guess it's good to have it.

LingMan · 2021-03-28T14:37:57Z

library/core/src/num/mod.rs

+                    Some(result) => result,
+                    None => return Err(PIE { kind: PosOverflow }),
+                };
+                idx += 1;


No need to manage an index manually if you loop over the factors directly:

for (&c, &factor) in digits.zip(factors) { ... }

Assuming zip doesn't inhibit any optimizations.

Both zip and enumerate inhibited optimisations - I like that euphamism.

I've seen some weird interactions between zip and rev, which currently(?) cause suboptimal assembly, but never with enumerate. Is this another case of missing -Copt-level=3 on godbolt?

PS: No euphemism intended.

LingMan · 2021-03-28T14:48:28Z

library/core/src/num/mod.rs

+        if is_positive {
+            // The number is positive
+            for &c in digits {
+                let x = match (c as char).to_digit(radix) {
+                    Some(x) => x,
+                    None => return Err(PIE { kind: InvalidDigit }),
+                };
+                let x = match factors[idx].checked_mul(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: PosOverflow }),
+                };
+                result = match result.checked_add(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: PosOverflow }),
+                };
+                idx += 1;
+            }
+        } else {
+            // The number is negative
+            for &c in digits {
+                let x = match (c as char).to_digit(radix) {
+                    Some(x) => x,
+                    None => return Err(PIE { kind: InvalidDigit }),
+                };
+                let x = match factors[idx].checked_mul(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: NegOverflow }),
+                };
+                result = match result.checked_sub(x) {
+                    Some(result) => result,
+                    None => return Err(PIE { kind: NegOverflow }),
+                };
+                idx += 1;
+            }


You could move that for loop into a closure or macro or something that gets PosOverflow/NegOverflow and checked_add/checked_sub as parameters.

Mark-Simulacrum · 2021-04-04T03:02:03Z

IIRC @dtolnay has done some work around this in the ecosystem, do you think you'd be interested in reviewing this PR?

Otherwise I'll try to take a look soon.

gilescope · 2021-04-06T11:44:16Z

I am thinking of closing down this PR in favour of #83371

I am not sure that the technique in this PR has benefits scailed up to u64. I am looking to see if I can safely weaponise pickire's tricks at radix 10 parsing, but for now I think that should live in a separate atoi_radix10 crate where it can get proved out.

pickfire · 2021-04-06T14:53:45Z

I am not sure that the technique in this PR has benefits scailed up to u64. I am looking to see if I can safely weaponise pickire's tricks at radix 10 parsing, but for now I think that should live in a separate atoi_radix10 crate where it can get proved out.

Try not to publish a crate (so we don't occupy a name, like what I did to ve), also can just ping me for review, I would be happy to do that.

dtolnay · 2021-04-13T21:44:46Z

IIRC @dtolnay has done some work around this in the ecosystem, do you think you'd be interested in reviewing this PR?

I have done float parsing, and I have done int and float printing, but so far not int parsing. Would love if someone else were able to spearhead the review on this one.

pickfire · 2021-04-14T16:06:39Z

I have done porting a verify specific version of int parsing from c++. I believe @gilescope is currently experimenting to get a faster parser by having another repository to test his ideas out. @gilescope you may want to change this pull request to draft in the meantime. I will also take a look at it after that since we could have a faster way of testing out the new changes.

gilescope · 2021-04-14T19:01:20Z

Good idea

…

On Wed, 14 Apr 2021 at 17:06, Ivan Tham ***@***.***> wrote: I have done porting a verify specific version of int parsing from c++. I believe @gilescope <https://github.com/gilescope> is currently experimenting to get a faster parser by having another repository to test his ideas out. @gilescope <https://github.com/gilescope> you may want to change this pull request to draft in the meantime. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#83088 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAGEJCALZWXIDRNTO3FFPZTTIW4SDANCNFSM4ZEAKXMQ> .

gilescope · 2021-05-03T10:39:41Z

I'm going to close this for now to reduce confusion. I am working on base10, but I don't know when I'm going to be done done (there always seems to be some new idea to try). It's also a way off being generic so not that suitable for inclusion into std lib yet (if ever as there's quite a bit more code) but I do want to answer the question of what's the fastest way we can parse ints.

Meanwhile I think we should move forward with #83371 - there's no downside except a little more code and it speeds up all radix.

gilescope · 2022-04-04T06:38:05Z

I did finally get to a generic way of parsing base 10 strings fast and published it as atoi_radix10 crate.

rust-highfive assigned Mark-Simulacrum Mar 13, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 13, 2021

This comment has been minimized.

Sign in to view

jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 13, 2021

gilescope changed the title ~~~15% Faster str to int parsing (when base 10)~~ WIP: ~15% Faster str to int parsing (when base 10) Mar 15, 2021

restrict to 32bit

5659840

This comment has been minimized.

Sign in to view

Don't include neg sign.

eb371e2

gilescope mentioned this pull request Mar 22, 2021

Faster parsing for lower numbers for radix up to 16 #83371

Closed

pickfire reviewed Mar 22, 2021

View reviewed changes

LingMan reviewed Mar 28, 2021

View reviewed changes

gilescope marked this pull request as draft April 15, 2021 08:52

Mark-Simulacrum added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 26, 2021

gilescope closed this May 3, 2021

WIP: ~15% Faster str to int parsing (when base 10) #83088

WIP: ~15% Faster str to int parsing (when base 10) #83088

Conversation

gilescope commented Mar 13, 2021

rust-highfive commented Mar 13, 2021

This comment has been minimized.

This comment has been minimized.

gilescope commented Mar 15, 2021

DutchGhost commented Mar 15, 2021 • edited Loading

gilescope commented Mar 15, 2021 via email

joshtriplett commented Mar 18, 2021

rust-log-analyzer commented Mar 20, 2021

gilescope commented Mar 22, 2021

pickfire Mar 22, 2021 • edited Loading

Choose a reason for hiding this comment

gilescope Mar 23, 2021 • edited Loading

Choose a reason for hiding this comment

pickfire Mar 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gilescope Mar 29, 2021 • edited Loading

Choose a reason for hiding this comment

LingMan Mar 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LingMan Mar 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mark-Simulacrum commented Apr 4, 2021

gilescope commented Apr 6, 2021

pickfire commented Apr 6, 2021

dtolnay commented Apr 13, 2021

pickfire commented Apr 14, 2021 • edited Loading

gilescope commented Apr 14, 2021 via email

gilescope commented May 3, 2021

gilescope commented Apr 4, 2022

DutchGhost commented Mar 15, 2021 •

edited

Loading

pickfire Mar 22, 2021 •

edited

Loading

gilescope Mar 23, 2021 •

edited

Loading

pickfire Mar 23, 2021 •

edited

Loading

gilescope Mar 29, 2021 •

edited

Loading

LingMan Mar 29, 2021 •

edited

Loading

LingMan Mar 29, 2021 •

edited

Loading

pickfire commented Apr 14, 2021 •

edited

Loading