Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: new GraphemeCursor API #21

Closed
raphlinus opened this issue Mar 3, 2017 · 6 comments
Closed

RFC: new GraphemeCursor API #21

raphlinus opened this issue Mar 3, 2017 · 6 comments

Comments

@raphlinus
Copy link
Contributor

The existing API is unsuitable for cursor movement in xi-editor, because (a) xi's string representation is a rope, not a contiguous &str, and (b) I need to be able to start at an arbitrary offset in the string and find the previous or next boundary.

I propose implementing a new cursor-flavored API. The GraphemeCursor struct would be little more than the state machine state; it would not store a reference to the string. Queries would pass in a &str chunk and an offset within that chunk. It is the caller's responsibility to ensure that the offset is consistent with the cursor location. The return value of a query would either be a new offset, an indication that the boundary is beyond the extent of the provided chunk, or a request for pre-context. In the latter case, the caller would supply a string chunk preceding the chunk containing the cursor (the return value would probably include a negative offset), then retry the original query. In the second case, the caller would advance to the previous or next chunk, then retry.

Queries supported would include is_boundary, next_boundary, and prev_boundary. The latter two queries are stateful, in that the cursor is moved to the result of the query.

As an implementation detail, the existing iterator would be implemented as a relatively thin layer on top of the cursor. The implementations for supplying pre-context, and for advancing to previous and next chunks, would be trivial.

Since it is very easy to get details wrong, I would plan to do automated testing with randomly generated input strings, verifying that the next and prev boundaries are consistent with each other (which would hopefully automate detection of bugs such as #19), and that results with chunked input are consistent with whole-string.

Having a cursor-based implementation would help implement features such as #7.

I'd love feedback on whether this general direction makes sense before diving into implementation. If this is successful, I'd probably want to do something similar for word boundaries as well.

raphlinus added a commit to raphlinus/unicode-segmentation that referenced this issue Mar 4, 2017
@SimonSapin
Copy link
Contributor

Sounds good to me.

@alexcrichton, @kwantam, any opinion?

@raphlinus
Copy link
Contributor Author

I'm prototyping this in a fork: https://github.com/unicode-rs/unicode-segmentation . There are some small differences from what I wrote above, but the bones should be in place.

@alexcrichton
Copy link

@SimonSapin no opinion

@tapeinosyne
Copy link

The purpose of a cursor-oriented API is rather clear, and this proposal feels nice. (I have occasionally wished that even simpler string views supported similar methods.) If cursors make for a serviceable basis to the current iterators, all the better.

@HadrienG2
Copy link

HadrienG2 commented Jan 11, 2019

Should be fixed by #23 .

@raphlinus
Copy link
Contributor Author

Yes, I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants