-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: new GraphemeCursor API #21
Comments
Very much work in progress. See unicode-rs#21
Sounds good to me. @alexcrichton, @kwantam, any opinion? |
I'm prototyping this in a fork: https://github.com/unicode-rs/unicode-segmentation . There are some small differences from what I wrote above, but the bones should be in place. |
@SimonSapin no opinion |
The purpose of a cursor-oriented API is rather clear, and this proposal feels nice. (I have occasionally wished that even simpler string views supported similar methods.) If cursors make for a serviceable basis to the current iterators, all the better. |
Should be fixed by #23 . |
Yes, I think this issue can be closed. |
The existing API is unsuitable for cursor movement in xi-editor, because (a) xi's string representation is a rope, not a contiguous
&str
, and (b) I need to be able to start at an arbitrary offset in the string and find the previous or next boundary.I propose implementing a new cursor-flavored API. The
GraphemeCursor
struct would be little more than the state machine state; it would not store a reference to the string. Queries would pass in a&str
chunk and an offset within that chunk. It is the caller's responsibility to ensure that the offset is consistent with the cursor location. The return value of a query would either be a new offset, an indication that the boundary is beyond the extent of the provided chunk, or a request for pre-context. In the latter case, the caller would supply a string chunk preceding the chunk containing the cursor (the return value would probably include a negative offset), then retry the original query. In the second case, the caller would advance to the previous or next chunk, then retry.Queries supported would include is_boundary, next_boundary, and prev_boundary. The latter two queries are stateful, in that the cursor is moved to the result of the query.
As an implementation detail, the existing iterator would be implemented as a relatively thin layer on top of the cursor. The implementations for supplying pre-context, and for advancing to previous and next chunks, would be trivial.
Since it is very easy to get details wrong, I would plan to do automated testing with randomly generated input strings, verifying that the next and prev boundaries are consistent with each other (which would hopefully automate detection of bugs such as #19), and that results with chunked input are consistent with whole-string.
Having a cursor-based implementation would help implement features such as #7.
I'd love feedback on whether this general direction makes sense before diving into implementation. If this is successful, I'd probably want to do something similar for word boundaries as well.
The text was updated successfully, but these errors were encountered: