Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[flake8-use-pathlib] Recommend Path.iterdir() over os.scandir() (PTH209) #14623

Closed
wants to merge 2 commits into from

Conversation

InSyncWithFoo
Copy link
Contributor

Summary

Per discussion at #14509.

Test Plan

cargo nextest run and cargo insta test.

Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+4 -0 violations, +0 -0 fixes in 3 projects; 52 projects unchanged)

apache/airflow (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ dev/breeze/src/airflow_breeze/commands/release_candidate_command.py:315:18: PTH209 Use `pathlib.Path.iterdir()` instead.

bokeh/bokeh (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

+ tests/unit/bokeh/test_ext.py:69:19: PTH209 Use `pathlib.Path.iterdir()` instead.

scikit-build/scikit-build-core (+2 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ src/scikit_build_core/_shutil.py:93:10: PTH209 Use `pathlib.Path.iterdir()` instead.
+ src/scikit_build_core/build/_pathutil.py:23:18: PTH209 Use `pathlib.Path.iterdir()` instead.

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
PTH209 4 4 0 0 0

@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule preview Related to preview mode features labels Nov 27, 2024
@AlexWaygood
Copy link
Member

How does Path.iterdir compare in terms of performance? I realise that most of the flake8-use-pathlib rules risk suggesting changes that might make your code slower, but I believe this may be particularly bad. See the docs for Path.scandir, which has been added in Python 3.14 (which is still in the alpha period):

Using scandir() instead of iterdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory.

It might at least be worth mentioning in the docs for this rule, if Path.iterdir is significantly slower.

@InSyncWithFoo
Copy link
Contributor Author

InSyncWithFoo commented Nov 27, 2024

@AlexWaygood Quite bad, in fact:

# s.py
from pathlib import Path

d = Path(__file__).parent / 'd'
d.mkdir(exist_ok = True)

for i in range(10_000):
	(d / f'd{i}').mkdir()

	with (d / f'f{i}.txt').open('w'):
		pass
# t.py
from timeit import timeit

N = 1000
SETUP = '''
from pathlib import Path
import os

d = '/workspaces/test/d'
p = Path(d)
'''

a = timeit(stmt = '''for _ in p.iterdir(): ...''', setup = SETUP, number = N)
print(a)

b = timeit(stmt = '''for _ in os.listdir(d): ...''', setup = SETUP, number = N)
print(b)

Results on a 4-core, 16GB RAM GitHub Codespace:

@InSyncWithFoo ➜ /workspaces/test (main) $ uv run -p 3.13 python s.py
@InSyncWithFoo ➜ /workspaces/test (main) $ uv run -p 3.13 python t.py
28.847444689999975
10.691104289999885

That's a ~180% increase.

@AlexWaygood
Copy link
Member

@InSyncWithFoo it looks like you're comparing Path.iterdir() with os.listdir() there rather than os.scandir(). But yeah, I'd expect a comparison with scandir() to be similarly unfavourable. Please add a ## Known issues to the docs for this rule saying that we expect this rule to result in slower code, and that you shouldn't enable it in performance-critical places.

If #14632 is accepted, we can also have this rule suggest Path.scandir() rather than Path.iterdir() if the user's target version is Python 3.14 or higher. That should address some of the performance concerns.

@MichaReiser
Copy link
Member

I'm concerned about adding a rule that is a considerable foot gun for the majority of users when its only benefit is consistency. Especially because the use of os.scandir might be exactly because of its performance characteristics.

I suggest we wait to add this rule until Python 3.14 is closer to stabilizing and only then suggest using Path.scandir.

@MichaReiser MichaReiser added the needs-decision Awaiting a decision from a maintainer label Nov 27, 2024
@AlexWaygood
Copy link
Member

@MichaReiser and I discussed this offline. There's a general issue with our flake8-pathlib rules that they could impact performance, but this case seems especially bad.

Our decision is that we'll keep PTH208 (because Path.iterdir() is a drop-in replacement for os.listdir()), but we'll unfortunately reject this PR (because Path.iterdir() is not a drop-in replacement for os.scandir(), it does something slightly different, and the likely reason they're using os.scandir() rather than os.listdir() is precisely because it's the option you want to use for maximum performance).

When Python 3.14 enters the beta period, we can consider adding a rule suggesting to replace os.scandir with Path.scandir, assuming the new API is still present when the beta period starts.

Sorry about this @InSyncWithFoo -- thanks for the PR anyway! Cc. @sbrugman also, since you were involved in the discussion in #14509

@AlexWaygood AlexWaygood removed the needs-decision Awaiting a decision from a maintainer label Nov 27, 2024
@InSyncWithFoo InSyncWithFoo deleted the PTH209 branch November 27, 2024 12:53
@InSyncWithFoo
Copy link
Contributor Author

I guess I'll just have to wait for 3.14.0 beta 1, to be released sometime in May. Hopefully I won't forget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants