Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize makedirs on Windows #101358

Open
DefaultRyan opened this issue Jan 26, 2023 · 0 comments
Open

Optimize makedirs on Windows #101358

DefaultRyan opened this issue Jan 26, 2023 · 0 comments
Assignees
Labels
OS-windows performance Performance or resource usage type-feature A feature request or enhancement

Comments

@DefaultRyan
Copy link

I've been working to improve the performance of some Python workflows on Windows, and I found that a noticeable amount of time was being spent in makedirs.

I some more profiling on the venv create scenario, and learned that roughly 1/3rd of the stat cost can be solely attributed to the exists check in makedirs (src). (Which is being called by compile here, and from my cursory glance at that code, it should simply be passing exist_ok=True rather than catching FileExistsError, right?). What makes this especially wasteful is that this run contained 1451 calls to makedirs and 1455 calls to makedir, implying that the exists check was making things worse, rather than helping. Put another way, exists is taking nearly as much time as mkdir here.

While it's true that if/when we get GetFileInformationByPath(), it should make stat calls much faster, that will only benefit the minority of users running a new version of Windows containing this new API. Therefore, it makes sense for us to improve the state of things for users running an older Windows, when practical.

And in this case, we should allow makedirs on Windows to execute in an idiomatically performant way. And that is to simply call CreateDirectoryW() and selectively ignore ERROR_ALREADY_EXISTS.

CreateDirectoryW() isn't cheap, but in the case of an existing directory, I don't think it's any worse than Windows' stat implementation that uses CreateFileW(). And in the case of when the directory doesn't exist and needs to be created, it will be significantly faster to make the single call.

It's probably worth splitting off a Windows-specific implementation of makedirs into the native side (posixmodule.c ?) so that we don't depend on raising a relatively expensive FileNotFoundError.
The non-Windows implementation should still use the cheap and reliable exists() and isdir().

(Forked from discussion on #101196)

@DefaultRyan DefaultRyan added the type-feature A feature request or enhancement label Jan 26, 2023
@AlexWaygood AlexWaygood added performance Performance or resource usage OS-windows labels Jan 26, 2023
@mdboom mdboom self-assigned this Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants