Remove support for 1-token lookahead from the lexer #62329

matklad · 2019-07-03T12:19:29Z

StringReader maintained peek_token and peek_span_src_raw for look ahead.

peek_token was used only by rustdoc syntax coloring. After moving peeking logic into highlighter, I was able to remove peek_token from the lexer. I tried to use iter::Peekable, but that wasn't as pretty as I hoped, due to buffered fatal errors. So I went with hand-rolled peeking.

After that I've noticed that the only peeking behavior left was for raw tokens to test tt jointness. I've rewritten it in terms of trivia tokens, and not just spans.

After that it became possible to simplify the awkward constructor of the lexer, which could return Err if the first peeked token contained error.

matklad · 2019-07-03T12:19:38Z

r? @petrochenkov

matklad · 2019-07-03T12:20:18Z

cc @GuillaumeGomez for highlight.rs change I guess (first commit)

matklad · 2019-07-03T12:45:03Z

src/libsyntax/parse/lexer/tokentrees.rs

                self.real_token();
-                let is_joint = raw.hi() == self.string_reader.peek_span_src_raw.lo()
-                    && self.token.is_op();
+                let is_joint = self.joint_to_prev == Joint && self.token.is_op();


self.token.is_op() makes me think that perhaps we should check that the previous token is also an op?

That is, we currently can say Join for an identifier followed by (, for example.

All the observable behavior is represented by the Spacing enum returned by Punct::spacing.

It has pretty specific documentation - we are "revealing" the jointness knowledge only for Punct + Punct pairs and additionally for Punct(') + Ident (for lifetimes).
Any other jointness is not revealed, Ident + ( in particular is not revealed because ( is not is_op, so the implementation looks correct.

Perhaps we should do this hiding entirely at the proc macro interface border though, and keep the full knowledge internally.

in particular is not revealed because ( is not is_op, so the implementation looks correct.

Ah, indeed, it needs to be Ident-, for example.

Yeah, I think we should either handle this completely on proc-macro layer, or completely in the TokenTreesReader, but this is unrelated to PR at hand

src/libsyntax/parse/lexer/mod.rs

petrochenkov · 2019-07-03T16:34:30Z

@bors r+

bors · 2019-07-03T16:34:32Z

📌 Commit f1f8def5e9f1ee88222a2dce3d4da007e0350556 has been approved by petrochenkov

matklad · 2019-07-03T16:46:12Z

@bors r=petrochenkov

bors · 2019-07-03T16:46:13Z

📌 Commit 07a9e4dcd2a518f7916a9db0487a1b950fa50e01 has been approved by petrochenkov

bors · 2019-07-04T03:21:20Z

☔ The latest upstream changes (presumably #62355) made this pull request unmergeable. Please resolve the merge conflicts.

The reader itself doesn't need ability to peek tokens, so it's better if clients implement this functionality. This hopefully becomes especially easy once we use iterator interface for lexer, but this is not too easy at the moment, because of buffered errors.

matklad · 2019-07-04T06:26:40Z

@bors r=petrochenkov

bors · 2019-07-04T06:26:42Z

📌 Commit 3e362a4 has been approved by petrochenkov

Remove support for 1-token lookahead from the lexer `StringReader` maintained `peek_token` and `peek_span_src_raw` for look ahead. `peek_token` was used only by rustdoc syntax coloring. After moving peeking logic into highlighter, I was able to remove `peek_token` from the lexer. I tried to use `iter::Peekable`, but that wasn't as pretty as I hoped, due to buffered fatal errors. So I went with hand-rolled peeking. After that I've noticed that the only peeking behavior left was for raw tokens to test tt jointness. I've rewritten it in terms of trivia tokens, and not just spans. After that it became possible to simplify the awkward constructor of the lexer, which could return `Err` if the first peeked token contained error.

GuillaumeGomez · 2019-07-04T12:10:23Z

A bit late but looks good to me as well (for the rustdoc part at least).

Remove support for 1-token lookahead from the lexer `StringReader` maintained `peek_token` and `peek_span_src_raw` for look ahead. `peek_token` was used only by rustdoc syntax coloring. After moving peeking logic into highlighter, I was able to remove `peek_token` from the lexer. I tried to use `iter::Peekable`, but that wasn't as pretty as I hoped, due to buffered fatal errors. So I went with hand-rolled peeking. After that I've noticed that the only peeking behavior left was for raw tokens to test tt jointness. I've rewritten it in terms of trivia tokens, and not just spans. After that it became possible to simplify the awkward constructor of the lexer, which could return `Err` if the first peeked token contained error.

@ghost

Rollup of 8 pull requests Successful merges: - #60260 (Add support for UWP targets) - #62151 (Update linked OpenSSL version) - #62245 (Miri engine: support extra function (pointer) values) - #62257 (forward read_c_str method from Memory to Alloc) - #62264 (Fix perf regression from Miri Machine trait changes) - #62296 (request at least ptr-size alignment from posix_memalign) - #62329 (Remove support for 1-token lookahead from the lexer) - #62377 (Add test for ICE #62375) Failed merges: r? @ghost

Remove support for 1-token lookahead from the lexer `StringReader` maintained `peek_token` and `peek_span_src_raw` for look ahead. `peek_token` was used only by rustdoc syntax coloring. After moving peeking logic into highlighter, I was able to remove `peek_token` from the lexer. I tried to use `iter::Peekable`, but that wasn't as pretty as I hoped, due to buffered fatal errors. So I went with hand-rolled peeking. After that I've noticed that the only peeking behavior left was for raw tokens to test tt jointness. I've rewritten it in terms of trivia tokens, and not just spans. After that it became possible to simplify the awkward constructor of the lexer, which could return `Err` if the first peeked token contained error.

@ghost

Rollup of 7 pull requests Successful merges: - #62151 (Update linked OpenSSL version) - #62245 (Miri engine: support extra function (pointer) values) - #62257 (forward read_c_str method from Memory to Alloc) - #62264 (Fix perf regression from Miri Machine trait changes) - #62296 (request at least ptr-size alignment from posix_memalign) - #62329 (Remove support for 1-token lookahead from the lexer) - #62377 (Add test for ICE #62375) Failed merges: r? @ghost

rust-highfive assigned eddyb Jul 3, 2019

rust-highfive assigned petrochenkov and unassigned eddyb Jul 3, 2019

matklad force-pushed the no-peeking branch from a1ccb3c to 928005d Compare July 3, 2019 12:23

matklad commented Jul 3, 2019

View reviewed changes

matklad force-pushed the no-peeking branch 3 times, most recently from 8b2cfd2 to f1f8def Compare July 3, 2019 15:57

petrochenkov reviewed Jul 3, 2019

View reviewed changes

src/libsyntax/parse/lexer/mod.rs Outdated Show resolved Hide resolved

bors added the S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. label Jul 3, 2019

matklad force-pushed the no-peeking branch from f1f8def to 07a9e4d Compare July 3, 2019 16:45

Centril mentioned this pull request Jul 3, 2019

Rollup of 17 pull requests #62349

Closed

Centril mentioned this pull request Jul 3, 2019

Rollup of 18 pull requests #62352

Closed

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jul 4, 2019

matklad force-pushed the no-peeking branch from 07a9e4d to 3f76f32 Compare July 4, 2019 05:59

matklad added 5 commits July 4, 2019 09:01

remove StringReader::peek

830ff4a

The reader itself doesn't need ability to peek tokens, so it's better if clients implement this functionality. This hopefully becomes especially easy once we use iterator interface for lexer, but this is not too easy at the moment, because of buffered errors.

remove peek_token from StringReader

e9dc95c

remove peek_span_src_raw from StringReader

256df83

cleanup lexer constructors

601bad8

move constructors to top

30fa99e

matklad added 4 commits July 4, 2019 09:08

slightly comment lexer API

1c6eb19

don't rely on spans when checking tokens for jointness

8bea334

remove unused mk_sp_and_raw

3035a05

make unwrap_or_abort non-generic again

3e362a4

matklad force-pushed the no-peeking branch from 3f76f32 to 3e362a4 Compare July 4, 2019 06:14

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 4, 2019

Mark-Simulacrum mentioned this pull request Jul 4, 2019

Rollup of 8 pull requests #62372

Closed

Centril mentioned this pull request Jul 5, 2019

Rollup of 8 pull requests #62424

Closed

Centril mentioned this pull request Jul 6, 2019

Rollup of 7 pull requests #62428

Merged

bors merged commit 3e362a4 into rust-lang:master Jul 6, 2019

matklad deleted the no-peeking branch July 6, 2019 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove support for 1-token lookahead from the lexer #62329

Remove support for 1-token lookahead from the lexer #62329

matklad commented Jul 3, 2019

matklad commented Jul 3, 2019

matklad commented Jul 3, 2019

matklad Jul 3, 2019

petrochenkov Jul 3, 2019 •

edited

Loading

matklad Jul 3, 2019

petrochenkov commented Jul 3, 2019

bors commented Jul 3, 2019

matklad commented Jul 3, 2019

bors commented Jul 3, 2019

bors commented Jul 4, 2019

matklad commented Jul 4, 2019

bors commented Jul 4, 2019

GuillaumeGomez commented Jul 4, 2019

Remove support for 1-token lookahead from the lexer #62329

Remove support for 1-token lookahead from the lexer #62329

Conversation

matklad commented Jul 3, 2019

matklad commented Jul 3, 2019

matklad commented Jul 3, 2019

matklad Jul 3, 2019

Choose a reason for hiding this comment

petrochenkov Jul 3, 2019 • edited Loading

Choose a reason for hiding this comment

matklad Jul 3, 2019

Choose a reason for hiding this comment

petrochenkov commented Jul 3, 2019

bors commented Jul 3, 2019

matklad commented Jul 3, 2019

bors commented Jul 3, 2019

bors commented Jul 4, 2019

matklad commented Jul 4, 2019

bors commented Jul 4, 2019

GuillaumeGomez commented Jul 4, 2019

petrochenkov Jul 3, 2019 •

edited

Loading