-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace row/column based Location
with byte-offsets.
#3931
Conversation
Just for my own curiosity, what's the context for this? Why's it necessary? |
Strictly speaking, it isn't necessary from a functional point of view, but using byte offsets helps to improve performance and reduce memory consumption. I started investigating switching to byte offsets because enabling the pycodestyle (logical line) rules #3689 results in a 20%-50% performance regression, even tough I already improved the performance of the rules themselves. A key observation is that benchmarks for the default-rules regress more than for the all rules benchmarks.
This is because the pycodestyle rules are the first rules in the default set that inspect the source code (trivia). The challenge with inspecting the source code is that you can't slice a string with a row/column location. This isn't possible: My goal with using byte-offsets is to remove the need to build a There are other, non-pycodestyle specific reasons why I want to adopt byte offsets:
parser/numpy/globals.py time: [65.752 µs 65.844 µs 65.973 µs] thrpt: [44.725 MiB/s 44.813 MiB/s 44.876 MiB/s] change: time: [-5.8033% -5.5729% -5.3542%] (p = 0.00 < 0.05) thrpt: [+5.6571% +5.9018% +6.1609%] Performance has improved. Found 21 outliers among 100 measurements (21.00%) 15 (15.00%) low severe 2 (2.00%) low mild 3 (3.00%) high mild 1 (1.00%) high severe parser/pydantic/types.py time: [1.4442 ms 1.4453 ms 1.4466 ms] thrpt: [17.630 MiB/s 17.646 MiB/s 17.659 MiB/s] change: time: [-12.393% -12.225% -11.904%] (p = 0.00 < 0.05) thrpt: [+13.512% +13.927% +14.146%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe parser/numpy/ctypeslib.py time: [647.58 µs 650.18 µs 652.90 µs] thrpt: [25.503 MiB/s 25.610 MiB/s 25.713 MiB/s] change: time: [-14.351% -14.154% -13.948%] (p = 0.00 < 0.05) thrpt: [+16.209% +16.488% +16.756%] Performance has improved. Found 18 outliers among 100 measurements (18.00%) 17 (17.00%) high mild 1 (1.00%) high severe parser/large/dataset.py time: [3.5024 ms 3.5104 ms 3.5195 ms] thrpt: [11.559 MiB/s 11.589 MiB/s 11.616 MiB/s] change: time: [-11.825% -11.603% -11.374%] (p = 0.00 < 0.05) thrpt: [+12.834% +13.126% +13.411%] Performance has improved. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 4 (4.00%) high severe Other compilers using byte-offsts:
|
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
6ecd573
to
5c17126
Compare
Thanks for the detailed explanation @MichaReiser! If you don't mind - where do the offsets come from? Column offset makes sense (obviously just offsetting from index 0), but how are rows represented? Or is a total byte-offset calculated from what is effectively row 0, column 0? E: Never mind, just found |
a69f012
to
09cbc45
Compare
5161fdc
to
9435ba5
Compare
9435ba5
to
9c24e59
Compare
PR Check ResultsEcosystemℹ️ ecosystem check detected changes. (+0, -16, 0 error(s)) airflow (+0, -7)
- airflow/api_connexion/endpoints/task_instance_endpoint.py:274:12: RET504 Unnecessary variable assignment before `return` statement
- airflow/providers/amazon/aws/secrets/systems_manager.py:200:16: RET504 Unnecessary variable assignment before `return` statement
- airflow/providers/docker/operators/docker.py:479:16: RET504 Unnecessary variable assignment before `return` statement
- airflow/providers/oracle/hooks/oracle.py:42:12: RET504 Unnecessary variable assignment before `return` statement
- airflow/security/utils.py:83:12: RET504 Unnecessary variable assignment before `return` statement
- airflow/www/extensions/init_appbuilder.py:359:16: RET504 Unnecessary variable assignment before `return` statement
- tests/test_utils/gcp_system_helpers.py:65:12: RET504 Unnecessary variable assignment before `return` statement bokeh (+0, -1)
- src/bokeh/core/property/datetime.py:165:16: RET504 Unnecessary variable assignment before `return` statement zulip (+0, -8)
- zerver/data_import/rocketchat.py:141:12: RET504 Unnecessary variable assignment before `return` statement
- zerver/lib/message.py:186:12: RET504 Unnecessary variable assignment before `return` statement
- zerver/lib/narrow.py:891:12: RET504 Unnecessary variable assignment before `return` statement
- zerver/lib/url_preview/oembed.py:50:12: RET504 Unnecessary variable assignment before `return` statement
- zerver/models.py:184:12: RET504 Unnecessary variable assignment before `return` statement
- zerver/webhooks/basecamp/view.py:115:12: RET504 Unnecessary variable assignment before `return` statement
- zerver/webhooks/bitbucket2/view.py:436:12: RET504 Unnecessary variable assignment before `return` statement
- zerver/webhooks/zendesk/view.py:14:12: RET504 Unnecessary variable assignment before `return` statement BenchmarkLinux
Windows
|
1026903
to
b2a19a9
Compare
crates/ruff/src/rules/pycodestyle/rules/invalid_escape_sequence.rs
Outdated
Show resolved
Hide resolved
crates/ruff/src/rules/pycodestyle/rules/invalid_escape_sequence.rs
Outdated
Show resolved
Hide resolved
crates/ruff/src/rules/pycodestyle/rules/logical_lines/missing_whitespace.rs
Outdated
Show resolved
Hide resolved
crates/ruff/src/rules/pyupgrade/rules/printf_string_formatting.rs
Outdated
Show resolved
Hide resolved
crates/ruff/src/rules/pyupgrade/rules/replace_universal_newlines.rs
Outdated
Show resolved
Hide resolved
crates/ruff/src/rules/ruff/rules/ambiguous_unicode_character.rs
Outdated
Show resolved
Hide resolved
@evanrittenhouse, sorry for the late reply. The RustPython Lexer generates the offsets. The old implementation counted the rows and columns (from the start of the row). The lexer increments the current row index and resets the column to zero for every new line character. Byte offsets don't use row or columns. Instead, it's an offset from the beginning of the file. Think of the string as a byte array and the byte offset is the index into that array: def f(): pass
x = 20 The position of the identifier
The position of the
|
0e7a8fa
to
c477216
Compare
c477216
to
35c39a6
Compare
crates/ruff/src/rules/pycodestyle/snapshots/ruff__rules__pycodestyle__tests__E111_E11.py.snap
Outdated
Show resolved
Hide resolved
@@ -282,6 +282,16 @@ W19.py:133:1: W191 Indentation contains tabs | |||
137 | def test_keys(self): | |||
| | |||
|
|||
W19.py:136:1: W191 Indentation contains tabs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that this was a false negative because Indexer.strings
incorrectly suppressed this violation because it is on a line with a string. This now gets correctly reported because we test if the tab is inside of a string range (rather than on a line)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right.
crates/ruff/src/rules/pycodestyle/snapshots/ruff__rules__pycodestyle__tests__E501_E501.py.snap
Outdated
Show resolved
Hide resolved
1a1ea79
to
0963a3f
Compare
.. remove unnecessary `contains_line_break` calls, create non-empty range for `SyntaxErrors`
90fc963
to
5995306
Compare
5995306
to
7893968
Compare
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `^0.0.263` -> `^0.0.264` | [![age](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/compatibility-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/confidence-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [https://github.com/charliermarsh/ruff/pull/4123](https://togithub.com/charliermarsh/ruff/pull/4123) - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [https://github.com/charliermarsh/ruff/pull/4122](https://togithub.com/charliermarsh/ruff/pull/4122) ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4087](https://togithub.com/charliermarsh/ruff/pull/4087) ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4110](https://togithub.com/charliermarsh/ruff/pull/4110) - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4152](https://togithub.com/charliermarsh/ruff/pull/4152) - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4161](https://togithub.com/charliermarsh/ruff/pull/4161) - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4083](https://togithub.com/charliermarsh/ruff/pull/4083) - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4111](https://togithub.com/charliermarsh/ruff/pull/4111) - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4129](https://togithub.com/charliermarsh/ruff/pull/4129) - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4145](https://togithub.com/charliermarsh/ruff/pull/4145) - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4165](https://togithub.com/charliermarsh/ruff/pull/4165) - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4176](https://togithub.com/charliermarsh/ruff/pull/4176) ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3931](https://togithub.com/charliermarsh/ruff/pull/3931) - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4022](https://togithub.com/charliermarsh/ruff/pull/4022) - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3985](https://togithub.com/charliermarsh/ruff/pull/3985) - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4138](https://togithub.com/charliermarsh/ruff/pull/4138) **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/ixm-one/pytest-cmake-presets). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42Ni4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjYuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Signed-off-by: Renovate Bot <[email protected]> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `==0.0.263` -> `==0.0.264` | [![age](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/compatibility-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/confidence-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [https://github.com/charliermarsh/ruff/pull/4123](https://togithub.com/charliermarsh/ruff/pull/4123) - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [https://github.com/charliermarsh/ruff/pull/4122](https://togithub.com/charliermarsh/ruff/pull/4122) ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4087](https://togithub.com/charliermarsh/ruff/pull/4087) ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4110](https://togithub.com/charliermarsh/ruff/pull/4110) - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4152](https://togithub.com/charliermarsh/ruff/pull/4152) - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4161](https://togithub.com/charliermarsh/ruff/pull/4161) - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4083](https://togithub.com/charliermarsh/ruff/pull/4083) - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4111](https://togithub.com/charliermarsh/ruff/pull/4111) - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4129](https://togithub.com/charliermarsh/ruff/pull/4129) - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4145](https://togithub.com/charliermarsh/ruff/pull/4145) - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4165](https://togithub.com/charliermarsh/ruff/pull/4165) - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4176](https://togithub.com/charliermarsh/ruff/pull/4176) ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3931](https://togithub.com/charliermarsh/ruff/pull/3931) - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4022](https://togithub.com/charliermarsh/ruff/pull/4022) - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3985](https://togithub.com/charliermarsh/ruff/pull/3985) - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4138](https://togithub.com/charliermarsh/ruff/pull/4138) **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/allenporter/flux-local). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42OS4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjkuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [ruff](https://togithub.com/charliermarsh/ruff) | `==0.0.263` -> `==0.0.264` | [![age](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/age-slim)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/adoption-slim)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/compatibility-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://badges.renovateapi.com/packages/pypi/ruff/0.0.264/confidence-slim/0.0.263)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>charliermarsh/ruff</summary> ### [`v0.0.264`](https://togithub.com/charliermarsh/ruff/releases/tag/v0.0.264) [Compare Source](https://togithub.com/charliermarsh/ruff/compare/v0.0.263...v0.0.264) <!-- Release notes generated using configuration in .github/release.yml at 8cb76f85eba1c970a8c800348fd1e0c874621a57 --> #### What's Changed ##### Rules - Autofix `EM101`, `EM102`, `EM103` if possible by [@​dhruvmanila](https://togithub.com/dhruvmanila) in [https://github.com/charliermarsh/ruff/pull/4123](https://togithub.com/charliermarsh/ruff/pull/4123) - Add bugbear immutable functions as allowed in dataclasses by [@​mosauter](https://togithub.com/mosauter) in [https://github.com/charliermarsh/ruff/pull/4122](https://togithub.com/charliermarsh/ruff/pull/4122) ##### Settings - Add support for providing command-line arguments via `argfile` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4087](https://togithub.com/charliermarsh/ruff/pull/4087) ##### Bug Fixes - Make D410/D411 autofixes mutually exclusive by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4110](https://togithub.com/charliermarsh/ruff/pull/4110) - Remove `pyright` comment prefix from PYI033 checks by [@​evanrittenhouse](https://togithub.com/evanrittenhouse) in [https://github.com/charliermarsh/ruff/pull/4152](https://togithub.com/charliermarsh/ruff/pull/4152) - Fix F811 false positive with match by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4161](https://togithub.com/charliermarsh/ruff/pull/4161) - Fix `E713` and `E714` false positives for multiple comparisons by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4083](https://togithub.com/charliermarsh/ruff/pull/4083) - Fix B023 shadowed variables in nested functions by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4111](https://togithub.com/charliermarsh/ruff/pull/4111) - Preserve star-handling special-casing for force-single-line by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4129](https://togithub.com/charliermarsh/ruff/pull/4129) - Respect parent-scoping rules for `NamedExpr` assignments by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4145](https://togithub.com/charliermarsh/ruff/pull/4145) - Fix UP032 auto-fix by [@​JonathanPlasse](https://togithub.com/JonathanPlasse) in [https://github.com/charliermarsh/ruff/pull/4165](https://togithub.com/charliermarsh/ruff/pull/4165) - Allow boolean parameters for `pytest.param` by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4176](https://togithub.com/charliermarsh/ruff/pull/4176) ##### Internal - Replace row/column based `Location` with byte-offsets. by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3931](https://togithub.com/charliermarsh/ruff/pull/3931) - perf(logical-lines): Various small perf improvements by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/4022](https://togithub.com/charliermarsh/ruff/pull/4022) - Use `memchr` to speedup newline search on x86 by [@​MichaReiser](https://togithub.com/MichaReiser) in [https://github.com/charliermarsh/ruff/pull/3985](https://togithub.com/charliermarsh/ruff/pull/3985) - Remove `ScopeStack` in favor of child-parent `ScopeId` pointers by [@​charliermarsh](https://togithub.com/charliermarsh) in [https://github.com/charliermarsh/ruff/pull/4138](https://togithub.com/charliermarsh/ruff/pull/4138) **Full Changelog**: astral-sh/ruff@v0.0.263...v0.0.264 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://app.renovatebot.com/dashboard#github/allenporter/pyrainbird). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNS42OS4zIiwidXBkYXRlZEluVmVyIjoiMzUuNjkuMyIsInRhcmdldEJyYW5jaCI6Im1haW4ifQ==--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Summary
This PR changes ruff to use our own fork of
RustPython
that replacesLocation { row: u32, column: u32 }
withTextSize
astral-sh/RustPython#4. The main motivation for this change is to ship the logical line rules. Enabling the logical line rules regresses performance by as much as 50% because the rules need to slice into the source string, which requires building and querying theLineIndex
. Using byte offsets everywhere trades the need from having to build theLineIndex
to inspect the source text in lint rules with re-computing the row and column information when rendering diagnostics. This is a favourable trade because most projects using ruff only have very few diagnostics.Notable Changes
SourceCodeFile
It is now necessary to always include the source code when passing
Message
s because the source text is necessary to re-compute the row and column positions for byte offsets (TextSize
). Previously, the source text was only included when using--show-source
. This results in a noticeable slowdown in projects with many (ten thousand) diagnostics.Locator
The
Locator
now exposes methods to:The computations are performed on demand without querying the
LineIndex
.The
Locator
still has a lazy computedLineIndex
because we have a few diagnostics that use a line number as part of their message.SourceCode
SourceCode
now provides methods to compute theSourceLocation
(row column information) given an offset.UniversalNewline
The
UniversalNewline
iterator now returnsLine
items instead of&str
. This is necessary because many lints need to know the offset of thenth
line and summing thetext.text_len()
doesn't give you the right result because thetext
does not include the trailing newline character:The text len of the first line is 6 bytes because the line does not include the trailing newline character.
The
Line
struct provides methods o get a line's start offset, end offset, range, and text. It also provides methods to get the text, end offset, and range, including the trailing newline character.Use
TextRange
for rangesConsistently uses
TextRange
in favor of:(Location, Location)
andstart: Location, end: Location
becauseTextRange
better communicates that the two offsets are related.Replaces all references of
Range
withTextRange
and deletesRange
.Use
TextSize
instead ofLocation
Replaces all references to
Location
withTextSize
.Stylist
This PR removes the lazy computations for
indention
andquote
because slicing into the source string is now cheap.Indexer
The
Indexer
used to store the line numbers of commented lines, lines with continuations, and lines with multiline strings. This is no longer feasible because it would require computing the line numbers. The new implementation stores the line-start offset for continuous lines and theTextRange
for comments and multiline strings.Storing the
TextRange
instead of line numbers helped to fix a false-negative where a mixed spaces-tab indent at the start of a multiline string was not reported because the analysis incorrectly assumed that it is part of the multiline string.Noqa
This PR now stores the
TextRange
of the line for each noqa comment sorted in ascending order by the start position. Testing whether adiagnostic
is suppressed requires a binary search on the ranges to test if any range contains thediagnostic
s start location.This PR further replaces the mapping to suppress some syntaxes on other lines by a
TextRange
vector where every entry means that a position falling into that range should be remapped to the end of the range.isort directives
Similar to noqa. It now stores the
TextRange
s instead of the line numbers for the areas where sorting is disabled. This PR now only stores theTextSize
for split positions as this proves to be sufficient.Benchmark
TLDR: 10% performance improvement for projects with few diagnostics. Identical performance or small regression for projects with many diagnostics. The new implementation with logical-lines enabled outperforms
main
with logical-lines disabled.Micro Benchmarks
This PR improves the default-rules benchmark by 6-15% and the all-rules benchmark by 4-8%. More importantly, ruff with logical-lines enabled is as fast or even faster than
main
. This should allow us to ship logical lines without causing a runtime regression.It's worth pointing out that the relative slowdown introduced by enabling the logical lines lint rules remains unchanged. I'm surprised by this because it doesn't show the improvement I expected from removing the
LineIndex
computation from the linting path.CPython
This benchmark measures the worst-case performance: A project with many violations.
--show-source
). This is expected because printing diagnostics now always requires storing the source text and computing the source locations adds some overhead as well.--no-cache
as seen in the micro benchmarks-s
). This shows the potential of the refactor for projects with few or no diagnostics. Silent still pays the overhead for storing the source text for every diagnostic, but the implementation doesn't compute theLineIndex
.Benchmark results
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e
./ruff-main ./crates/ruff/resources/test/cpython/ -e
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e --no-cache
./ruff-main ./crates/ruff/resources/test/cpython/ -e --no-cache
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e --select=ALL
./ruff-main ./crates/ruff/resources/test/cpython/ -e --select=ALL
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e --no-cache --select=ALL
./ruff-main ./crates/ruff/resources/test/cpython/ -e --no-cache --select=ALL
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e --show-source
./ruff-main ./crates/ruff/resources/test/cpython/ -e --show-source
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e --no-cache --show-source
./ruff-main ./crates/ruff/resources/test/cpython/ -e --no-cache --show-source
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e --select=ALL --show-source
./ruff-main ./crates/ruff/resources/test/cpython/ -e --select=ALL --show-source
./ruff-bytes ./crates/ruff/resources/test/cpython/ -e --no-cache --select=ALL --show-source
./ruff-main ./crates/ruff/resources/test/cpython/ -e --no-cache --select=ALL --show-source
Homeassitant
Best case benchmark: A project with very few diagnostics (10).
../ruff/ruff-bytes . -e
../ruff/ruff-main . -e
../ruff/ruff-bytes . -e --no-cache
../ruff/ruff-main . -e --no-cache
Enabling logical lines introduces many new errors (2000), no longer showing the best case. But the new implementation still outperforms the old with logical-lines enabled and remains about 10% faster.
../ruff/ruff-bytes-logical . -e
../ruff/ruff-main-logical . -e
../ruff/ruff-bytes-logical . -e --no-cache
../ruff/ruff-main-logical . -e --no-cache
Test Plan
--add-noqa
with airflow repository--fix
with airflow repositoryBreaking ChangesThis PR changes the column numbers of fixes in the JSON output to be one indexed to align the column numbers with theDiagnostic
start and end columns. I can undo this change but I got it "for free" by usingSourceLocation
consistently.I reverted the change in this PR and extracted it into #4007