Tracking Issue for str::{floor, ceil}_char_boundary #93743

clarfonthey · 2022-02-07T17:33:22Z

Feature gate: #![feature(round_char_boundary)]

This is a tracking issue for str::{floor, ceil}_char_boundary.

Public API

impl str {
    // Returns the character boundary at or immediately before `index`
    fn floor_char_boundary(&self, index: usize) -> usize;

    // Returns the character boundary at or immediately after `index`
    fn ceil_char_boundary(&self, index: usize) -> usize;
}

Steps / History

Implementation: Add {floor,ceil}_char_boundary methods to str #86497
Final comment period (FCP)
Stabilization PR

Unresolved Questions

Should ceil_char_boundary panic when indices are out of bounds? Don't panic in ceil_char_boundary #112387 is an existing PR to change this to return the length instead.

The text was updated successfully, but these errors were encountered:

nagisa · 2022-02-12T23:37:56Z

Have methods that return str slices instead of indices been considered as an alternative? I didn't really see any discussion to that effect in the issue. Given that these indices are only really usable for slicing into the containing string these were calculated from, it isn't all that obvious to me if this is the right API, as opposed to split_at_???_char_boundary_{,mut}.

clarfonthey · 2022-02-13T00:41:51Z

So, that would work for purely code-based splitting, but if a user were to perform more-complicated splitting by something like graphemes or words, they'd want to be able to look at the string both before and after the truncation point, not just at that point.

Since some cases required the index specifically, I decided to keep the API surface small. But adding extra variants that directly slice the string could be very useful too, like the ones you offered.

Stargateur · 2022-07-20T11:18:28Z

I don't see why choice to panic instead of return Option<usize> or Result<usize, ()>

clarfonthey · 2022-07-20T15:55:13Z

The main reason I chose that was that floor_char_boundary can never panic, and I wanted parity for the API with ceil_char_boundary. Do you have a case that would benefit from such an API?

Stargateur · 2022-07-20T17:05:04Z

One can panic the other not is not parity either. I advice to avoid panic when possible, while there is already a number of method that panic instead of return an Option or a Result, adding new item should probably follow better guide line, specially since Rust could be use in Kernel linux, having method that panic on primitive type would automatically ban them from use.

On this case I wonder if floor_char_boundary make sense, char_indices doesn't include the "last" boundary cause there is no character. I would advice to change method floor_char_boundary to return None when the index would result in len and remove panic of other method too using option instead.

Or something else that remove the panic.

clarfonthey · 2022-07-20T20:26:10Z

So, the reason why one can panic and one can't is basically because of the fact that indexes can't go below zero, and so you can always round down to 0, which is always a valid char boundary. Here, the context is that if you want to limit the size of a string in bytes, you can "round down" from that to a valid index, such that you can slice up to that.

For ceil... the general idea, if I'm being honest, is to round "up" the recommended number of bytes. So, for example, if you want to limit to 1000 bytes, but are fine allowing up to 1004 to include the last character. I guess in this case, it would be okay to return len instead of panicking, but I don't know it this would make sense in all cases.

Definitely open to suggestions on what you think is best, but IMHO the flooring version should definitely never panic, and always return a valid index.

angelorodem · 2023-06-07T02:06:49Z

I think that removal of panic from this function is a must, the usage is to correctly find the bounds of a char and prevent panicking when doing string processing, it seems not good that a function to prevent that can indeed panic.

clarfonthey · 2023-06-07T05:38:13Z

So, I was poking around the uses of ceil_char_boundary on GitHub: https://github.com/search?q=ceil_char_boundary+lang%253Arust

and am thinking that maybe the solution to a non-panicking version of ceil_char_boundary is to just return the length of the string if the provided index is out of bounds. The semantics feel a bit muddy, but this feels like the most logical way based upon how people are using it. What do people think about this idea?

Effectively, if you pass an index past the end of the string, it's treated as the end of the string, and rounds "up" to the end of the string.

eddyb · 2023-07-13T22:47:24Z

So, that would work for purely code-based splitting, but if a user were to perform more-complicated splitting by something like graphemes or words, they'd want to be able to look at the string both before and after the truncation point, not just at that point.

Since some cases required the index specifically, I decided to keep the API surface small. But adding extra variants that directly slice the string could be very useful too, like the ones you offered.

I believe there is some confusion here. The "(split) index" is not lost during split_at.

s.split_at(i) returns (&s[..i], &s[i..]) and so:

s.split_at(i).0.len() == i
s.split_at(i).1.len() == s.len() - i

That is, it's strictly more information than the original i, and for the methods @nagisa suggested (which I'd call split_at_char_boundary_{before,after}), since you can keep the original s yourself, and get the split position by .0.len(), it's just more ergonomic to also have the two &str parts.

Also because {floor,ceil}_char_boundary already do all the checks split_at require, it should be 100% free to have split_at_char_boundary_{before,after} which return (&str, &str) instead of usize (with the first &str containing verbatim the usize of {floor,ceil}_char_boundary, as its length, as explained above).

…boundary, r=m-ou-se Don't panic in ceil_char_boundary Implementing the alternative mentioned in this comment: rust-lang#93743 (comment) Since `floor_char_boundary` will always work (rounding down to the length of the string is possible), it feels best for `ceil_char_boundary` to not panic either. However, the semantics of "rounding up" past the length of the string aren't very great, which is why the method originally panicked in these cases. Taking into account how people are using this method, it feels best to simply return the end of the string in these cases, so that the result is still a valid char boundary.

… r=m-ou-se Don't panic in ceil_char_boundary Implementing the alternative mentioned in this comment: rust-lang/rust#93743 (comment) Since `floor_char_boundary` will always work (rounding down to the length of the string is possible), it feels best for `ceil_char_boundary` to not panic either. However, the semantics of "rounding up" past the length of the string aren't very great, which is why the method originally panicked in these cases. Taking into account how people are using this method, it feels best to simply return the end of the string in these cases, so that the result is still a valid char boundary.

clarfonthey added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Feb 7, 2022

scottmcm mentioned this issue Feb 7, 2022

Add {floor,ceil}_char_boundary methods to str #86497

Merged

remram44 mentioned this issue Jan 28, 2023

Document that this library doesn't work on stable Rust KonradHoeffner/hdt#24

Closed

clarfonthey mentioned this issue Jun 7, 2023

Don't panic in ceil_char_boundary #112387

Merged

zbuc mentioned this issue Jun 10, 2024

Enforce string limits for deserialization penumbra-zone/penumbra#4567

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue for str::{floor, ceil}_char_boundary #93743

Tracking Issue for str::{floor, ceil}_char_boundary #93743

clarfonthey commented Feb 7, 2022 •

edited

Loading

nagisa commented Feb 12, 2022

clarfonthey commented Feb 13, 2022

Stargateur commented Jul 20, 2022

clarfonthey commented Jul 20, 2022

Stargateur commented Jul 20, 2022 •

edited

Loading

clarfonthey commented Jul 20, 2022

angelorodem commented Jun 7, 2023

clarfonthey commented Jun 7, 2023 •

edited

Loading

eddyb commented Jul 13, 2023

Tracking Issue for str::{floor, ceil}_char_boundary #93743

Tracking Issue for str::{floor, ceil}_char_boundary #93743

Comments

clarfonthey commented Feb 7, 2022 • edited Loading

Public API

Steps / History

Unresolved Questions

nagisa commented Feb 12, 2022

clarfonthey commented Feb 13, 2022

Stargateur commented Jul 20, 2022

clarfonthey commented Jul 20, 2022

Stargateur commented Jul 20, 2022 • edited Loading

clarfonthey commented Jul 20, 2022

angelorodem commented Jun 7, 2023

clarfonthey commented Jun 7, 2023 • edited Loading

eddyb commented Jul 13, 2023

clarfonthey commented Feb 7, 2022 •

edited

Loading

Stargateur commented Jul 20, 2022 •

edited

Loading

clarfonthey commented Jun 7, 2023 •

edited

Loading