Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carry index of InvalidDigit on IntErrorKind #79728

Closed
wants to merge 2 commits into from

Conversation

eopb
Copy link
Contributor

@eopb eopb commented Dec 5, 2020

@Lucretiel #22639 (comment)

Before stabilizing, would it be very complicated to additionally add the byte index where parsing failed? I can imagine that information being useful to an introspector (for example, to print a verbose rust-like arrow pointing to the exact location of the parse failure).

@ethanboxx #22639 (comment)

Should be easy to do. I will probably open a PR with that soon so people discuss it.

@rust-highfive
Copy link
Collaborator

r? @withoutboats

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 5, 2020
@camelid camelid added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 15, 2021
@Dylan-DPC-zz
Copy link

r? @KodrAus

@KodrAus
Copy link
Contributor

KodrAus commented Jan 17, 2021

This seems reasonable to me. Maybe we could also update the Display implementation of ParseIntError to include the index and offending character encountered?

EDIT: Since we don’t have access to the original buffer just the index would be fine.

@KodrAus
Copy link
Contributor

KodrAus commented Jan 17, 2021

Thinking about this more, it might be better to try keep Kind a simple enum and optionally track an index on ParseIntError.

@eopb
Copy link
Contributor Author

eopb commented Jan 17, 2021

I agree that it would be nice to include the index in the Display implementation. I will add that soon.

I do not think placing the index as an Option on ParseIntError would be better. You lose the invariant that the index will only exist when the error is InvalidDigit and I feel that could add unnecessary API complexity.

I don't know what benefit keeping IntErrorKind as a simple enum would give.

@KodrAus
Copy link
Contributor

KodrAus commented Jan 17, 2021

@ethanboxx The way I see it, the benefits of carrying the index on ParseIntError rather than on IntErrorKind are:

  • We don't paint ourselves into a semver corner, which enums with fields do
  • We don't introduce a field that's only useful in niche cases (none of which seem to apply to the rustc codebase)
  • We don't need to introduce the same field on any new variants that it might also apply to
  • We have a way to introduce other pieces of information that may also be relevant, like the character itself that caused the error
  • We can continue to use patterns like if kind() == IntErrorKind::InvalidDigit { .. }

@JohnCSimon JohnCSimon added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 7, 2021
@camelid camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 26, 2021
@camelid
Copy link
Member

camelid commented Feb 26, 2021

Ping from triage: @ethanboxx could you address KodrAus's review? Thanks!

@eopb eopb force-pushed the interrorkind_invaliddigit_index branch from 048b00f to 6ce70b1 Compare March 5, 2021 12:30
@JohnTitor JohnTitor added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 18, 2021
@Dylan-DPC-zz
Copy link

r? @dtolnay

@rust-highfive rust-highfive assigned dtolnay and unassigned KodrAus Mar 30, 2021
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not immediately sold on this. Can you provide more detail on the use case where you want to use this? Is there code you could show that you would want to be able to write?

@Dylan-DPC-zz Dylan-DPC-zz added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 30, 2021
@eopb
Copy link
Contributor Author

eopb commented Apr 11, 2021

@dtolnay

Use cases

An example use case would be to print an arrow pointing to the exact location of parsing failure as mentioned in #22639 (comment) where this feature was requested.

Another example could be writing a parser that ignores invalid characters.

Currently, the way of doing these things would be to rewrite the parsing logic in the local crate to plug in the index logic.

Why put this on IntErrorKind rather than ParseIntError?

Because I think that this code

if let Err(InvalidDigit(i)) = "1a0".parse::<i32>().map(ParseIntError::kind) {
     println!("Parsing failed at index: {}", i);
}

Is nicer than this code

if let Err(error) = "1a0".parse::<i32>() {
    if let Err(InvalidDigit) = error.kind() {
        let i = error.index().unwrap(); // Forced to unwrap
        println!("Parsing failed at index: {}", i);
    }
}

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll go ahead and close this, since this API addition doesn't strike me as the most appropriate solution to either of those use cases.

An example use case would be to print an arrow pointing to the exact location of parsing failure as mentioned in #22639 (comment) where this feature was requested.

For parsing, ordinarily you need some variation of a lexer to decide what part of the input you are suppose to parse as a number in the first place. Consider input like {"id":987654321,"username":"ethanboxx"}. Some piece of logic needs to determine that the substring 987654321 is the correct thing to pass to u32::from_str, which it does by scanning ahead valid digits. Having from_str report an invalid digit position is only useful if you are planning to pass an entire suffix of the input to from_str i.e. 987654321,"username":"ethanboxx"}, then collect the invalid digit position, and run from_str a second time up to that position. This seems more confusing than just lexing the valid digits yourself up front.

The standard library already exposes the concept of "valid digit" by way of:

  • u8::is_ascii_digit / char::is_ascii_digit (for base 10)
  • u8::is_ascii_hexdigit / char::is_ascii_hexdigit (for base 16)
  • char::is_digit (for arbitrary radix)

Another example could be writing a parser that ignores invalid characters.

Again I don't think an invalid digit position reported by from_str is really appropriate here. It sounds like you are planning to take an input containing a mix of valid and invalid digits, and run from_str on it in a loop, removing one invalid digit at a time. That seems needlessly complicated and confusing compared to filtering for valid digits up front (using the above set of methods) and running from_str a single time on only valid digits.

@dtolnay dtolnay closed this Apr 11, 2021
@camelid
Copy link
Member

camelid commented Apr 11, 2021

For parsing, ordinarily you need some variation of a lexer to decide what part of the input you are suppose to parse as a number in the first place. Consider input like {"id":987654321,"username":"ethanboxx"}. Some piece of logic needs to determine that the substring 987654321 is the correct thing to pass to u32::from_str, which it does by scanning ahead valid digits. Having from_str report an invalid digit position is only useful if you are planning to pass an entire suffix of the input to from_str i.e. 987654321,"username":"ethanboxx"}, then collect the invalid digit position, and run from_str a second time up to that position. This seems more confusing than just lexing the valid digits yourself up front.

For IntErrorKind this is true, but when lexing floats you may accept any number of .s and enforce that there's only one . when using from_str() (e.g., I do this in a compiler I'm working on). If f{32,64}::from_str() returned an error that included the location with an invalid character, you could easily show a more specific error location. So, I could see a feature like in this PR but for floats being more useful.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jun 22, 2021
…yaahc

stabilize `int_error_matching`

closes rust-lang#22639

> It has been over half a year since rust-lang#77640 (review), and the indexing question is rejected in rust-lang#79728 (review), so I guess we can submit another stabilization attempt? 😉

_Originally posted by `@kennytm` in rust-lang#22639 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants