Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove hot uses of pathlib #14110

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

bonzini
Copy link
Collaborator

@bonzini bonzini commented Jan 10, 2025

pathlib's implementation of Path iteration is expensive; "foo.parents" has quadratic complexity when building it:

        return self._path._from_parsed_parts(self._drv, self._root,
                                             self._tail[:-idx - 1])

and comparisons performed by "path in file.parents" potentially have the same issue. Introduce a new function that checks whether a file is under a path in the same way, removing usage of Path from the biggest hotspot.

@bonzini bonzini requested a review from jpakkane as a code owner January 10, 2025 09:06
@bonzini bonzini force-pushed the speedup-pathlib branch 2 times, most recently from 0b2ef7f to bf03ed2 Compare January 10, 2025 12:11
@bonzini bonzini added this to the 1.8 milestone Jan 10, 2025
@eli-schwartz
Copy link
Member

Introduce a new function that checks whether a file is under a path in the same way, removing usage of Path from the biggest hotspot.

Isn't this just os.path.commonpath? This is a well-known problem with pathlib -- it loves to reimplement os.path via complicated and fiddly data structures with questionable correctness characteristics. And apparently questionable performance characteristics too...

@bonzini
Copy link
Collaborator Author

bonzini commented Jan 10, 2025

It's similar to commonpath indeed, but 1) it returns a boolean 2) it only operates on two paths 3) it is not symmetrical (one argument is the path and the other is a file) 4) it is faster.

And apparently questionable performance characteristics too...

Yeah, I like pathlib's ergonomics but I draw the line at quadratic implementations...

@eli-schwartz
Copy link
Member

It's similar to commonpath indeed, but 1) it returns a boolean 2) it only operates on two paths 3) it is not symmetrical (one argument is the path and the other is a file) 4) it is faster.

Faster, maybe. I haven't timed it. But with regard to returning a boolean etc. the point of commonpath isn't "this function returns whether a path is an ancestor" but rather that it's the trivial building block for such a check.

It's code we already use in several places:

if os.path.commonpath(x, y) == x:
    do_thing_with(y)

@bonzini
Copy link
Collaborator Author

bonzini commented Jan 10, 2025

Yes, it is faster: :)

>>> timeit.timeit('path_starts_with("/usr/bin/", "/usr/bin/foo/bar/baz")',  
     globals={'path_starts_with':path_starts_with, 'is_windows': 'windows', 'os': os})
5.237490472005447

>>> timeit.timeit('path_starts_with("/usr/bin/foo/bar/baz", "/usr/bin/")',  
     globals={'path_starts_with':path_starts_with, 'is_windows': 'windows', 'os': os})
6.680338967334726

>>> timeit.timeit('os.path.commonpath(["/usr/bin/", "/usr/bin/foo/bar/baz"]) == "/usr/bin"',
     globals={'path_starts_with':path_starts_with, 'is_windows': 'windows', 'os': os})
13.45890446798876

Also it doesn't crash and burn if you have different drives on Windows. :) Should I adjust other uses of commonpath?

@eli-schwartz
Copy link
Member

Also it doesn't crash and burn if you have different drives on Windows. :)

Yup, we catch ValueError for those. So I suppose it may anyways be worth extracting into a dedicated function. :P

Should I adjust other uses of commonpath?

Yes, please. If we need this function for performance in a hot path anyway, then we might as well use it everywhere the same code pattern is used.

@bonzini bonzini force-pushed the speedup-pathlib branch 2 times, most recently from d1eaebd to 7d781c8 Compare January 10, 2025 15:37
mesonbuild/utils/universal.py Outdated Show resolved Hide resolved
mesonbuild/interpreter/interpreter.py Outdated Show resolved Hide resolved
Comment on lines 3138 to 3141
if InterpreterRuleRelaxation.ALLOW_BUILD_DIR_FILE_REFERENCES in self.relaxations and path_starts_with(norm, self.environment.build_dir):
return
if srcdir not in norm.parents:

if not path_starts_with(norm, self.environment.source_dir):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message for this is no longer accurate. :D

"Introduce a new function" -> use our recently added function?

mesonbuild/interpreter/interpreter.py Outdated Show resolved Hide resolved
Introduce an alternative to os.path.commonpath(name, path) == path,
which is a common idiom throughout Meson.  Call it is_parent_path
just like the existing static method in Generator.
It is a bit faster and handles drives on Windows without the need
for an exception handler.

Signed-off-by: Paolo Bonzini <[email protected]>
This saves about 1% of execution time; path manipulation is expensive.
About 1% more could be saved by avoiding usage of "foo in bar.parents",
which is an expensive way to say "bar.startswith(foo)".

Signed-off-by: Paolo Bonzini <[email protected]>
pathlib's implementation of Path iteration is expensive;
"foo.parents" has quadratic complexity when building it:

        return self._path._from_parsed_parts(self._drv, self._root,
                                             self._tail[:-idx - 1])

and comparisons performed by "path in file.parents" potentially
have the same issue.  Introduce a new function that checks whether
a file is under a path in the same way, removing usage of Path
from the biggest hotspot.

Signed-off-by: Paolo Bonzini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants