You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been working to improve the performance of some Python workflows on Windows, and I found that a noticeable amount of time was being spent in makedirs.
I some more profiling on the venv create scenario, and learned that roughly 1/3rd of the stat cost can be solely attributed to the exists check in makedirs (src). (Which is being called by compile here, and from my cursory glance at that code, it should simply be passing exist_ok=True rather than catching FileExistsError, right?). What makes this especially wasteful is that this run contained 1451 calls to makedirs and 1455 calls to makedir, implying that the exists check was making things worse, rather than helping. Put another way, exists is taking nearly as much time as mkdir here.
While it's true that if/when we get GetFileInformationByPath(), it should make stat calls much faster, that will only benefit the minority of users running a new version of Windows containing this new API. Therefore, it makes sense for us to improve the state of things for users running an older Windows, when practical.
And in this case, we should allow makedirs on Windows to execute in an idiomatically performant way. And that is to simply call CreateDirectoryW() and selectively ignore ERROR_ALREADY_EXISTS.
CreateDirectoryW() isn't cheap, but in the case of an existing directory, I don't think it's any worse than Windows' stat implementation that uses CreateFileW(). And in the case of when the directory doesn't exist and needs to be created, it will be significantly faster to make the single call.
It's probably worth splitting off a Windows-specific implementation of makedirs into the native side (posixmodule.c ?) so that we don't depend on raising a relatively expensive FileNotFoundError.
The non-Windows implementation should still use the cheap and reliable exists() and isdir().
I've been working to improve the performance of some Python workflows on Windows, and I found that a noticeable amount of time was being spent in
makedirs
.I some more profiling on the venv create scenario, and learned that roughly 1/3rd of the stat cost can be solely attributed to the exists check in makedirs (src). (Which is being called by compile here, and from my cursory glance at that code, it should simply be passing exist_ok=True rather than catching FileExistsError, right?). What makes this especially wasteful is that this run contained 1451 calls to makedirs and 1455 calls to makedir, implying that the exists check was making things worse, rather than helping. Put another way, exists is taking nearly as much time as mkdir here.
While it's true that if/when we get
GetFileInformationByPath()
, it should make stat calls much faster, that will only benefit the minority of users running a new version of Windows containing this new API. Therefore, it makes sense for us to improve the state of things for users running an older Windows, when practical.And in this case, we should allow makedirs on Windows to execute in an idiomatically performant way. And that is to simply call
CreateDirectoryW()
and selectively ignoreERROR_ALREADY_EXISTS
.CreateDirectoryW()
isn't cheap, but in the case of an existing directory, I don't think it's any worse than Windows' stat implementation that usesCreateFileW()
. And in the case of when the directory doesn't exist and needs to be created, it will be significantly faster to make the single call.It's probably worth splitting off a Windows-specific implementation of makedirs into the native side (posixmodule.c ?) so that we don't depend on raising a relatively expensive FileNotFoundError.
The non-Windows implementation should still use the cheap and reliable
exists()
andisdir()
.(Forked from discussion on #101196)
The text was updated successfully, but these errors were encountered: