-
Notifications
You must be signed in to change notification settings - Fork 8
Replace row/column based Location with byte-offsets. #4
Conversation
@@ -0,0 +1,22 @@ | |||
[package] | |||
name = "ruff_text_size" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to move ruff_text_size
into this repository because:
- I want to avoid git submodules in our main repository
ruff_text_size
should rarely change
29ff5d8
to
b16267a
Compare
Yeah I think the biggest question for me here is the branching strategy... What I've done in the past is had |
I don't have any experience with this but we could try to creat a new upstream branch that matches RustPython's main and we merge our changes into our |
Using merge seems like the most straightforward approach from all the approaches listed in the friendly forks blog post |
e53251a
to
9787548
Compare
I haven't fully figured out how we want to keep our branch up to date, but rebasing merges seems promising, which is used by git-windows. It's currently hard to figure out the details without having a real merge to play through. That's why I want to merge this PR and look into documenting the merge strategy later. |
b59d7ef
to
53bef1a
Compare
I'd like to share I am going to fully merge this changes into RustPython upstream. 2 motivations.
One question is how much RustPython change is required to adapt this to upstream. For now, I am waiting you to find the best design to enhance performance of ruff. Please let me know if the patch is stablized and you are satisfied. |
Oh wow, this is huge for us! I'm happy to give a hand with upstreaming these changes. Something I noticed that requires change is that
The design is finalized and I plan to merge this PR and the PR to ruff by the end of this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there more patches immediately mergable into rustpython?
Otherwise I will try to ship this patch on RustPython late this week and next week.
@@ -649,7 +649,7 @@ def write_ast_def(mod, typeinfo, f): | |||
#![allow(clippy::derive_partial_eq_without_eq)] | |||
|
|||
pub use crate::constant::*; | |||
pub use crate::Location; | |||
pub use ruff_text_size::{TextSize, TextRange}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rustpython_parser::text_size
if the only direct dependency is rustpython-parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand the suggestion. Do you want to move ruff_text_size
to the rustpython-parser
or re-export the types for crates that only depend on rustpython_parser
? Having the type in a separate crate does have advantages for us, for example, ruff_diagnostics no longer depends on rustpython_parser
, which helps speed up our compile times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure it is only used to replace Location or have more usage in ruff. Thank you!
eb818e2
to
5ffa8a1
Compare
Current dependencies on/for this PR: This comment was auto-generated by Graphite. |
8f01c4f
to
767772f
Compare
767772f
to
a42e77a
Compare
Woohoo! |
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
This PR replaces RustPython's
Location { row: u32, column: u32 }
withTextSize
, a byte offset from the beginning of the file. The motivation of the change is that:Location
is 8 bytes, whereasTextSize
is only 4 bytes -> an 8 byte size reduction for every LexerItem
and 12 bytes forLocated
(This PR also changesOption<Location>
toTextSize
).Other notable changes
Located
now stores aTextRange
where theend_location
is no longer optional (exposed vialocated.end()
).Item
type of theLexer
from(Location, Tok, Location)
to(Tok, TextRange)
See astral-sh/ruff#3931 for more details and the upstream work necessary in ruff to use byte offsets.
Micro Benchmarks
Created running
cargo bench -p ruff_benchmark --bench parser
in the ruff repository.This change improves parsing and visiting performance by about 10%.
Downsides
The main downside of this change is that we now have to maintain our own fork of RustPython and that we would need to convert our AST to RustPython's AST if we ever decide to use RustPython to run e.g. tests or plugins.
Another downside is that this change is based on the assumption that it isn't necessary to compute row/column information for most parsed files, as is the case when linting a project that has adopted ruff.
I'm convinced that the speedups justify the change.
Target branch
This is a bit unclear to me. It could make sense to not merge to
main
or create a newupstream
branch that keepsLocation
and is where we pull upstream changes in (or branch off when contributing upstream).