Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimized versions of isdir / isfile on Windows #101196

Closed
mdboom opened this issue Jan 20, 2023 · 15 comments
Closed

Add optimized versions of isdir / isfile on Windows #101196

mdboom opened this issue Jan 20, 2023 · 15 comments
Assignees
Labels
3.12 bugs and security fixes OS-windows type-feature A feature request or enhancement

Comments

@mdboom
Copy link
Contributor

mdboom commented Jan 20, 2023

I went down this rabbit hole when someone mentioned that isfile/isdir/exists all make a rather expensive os.stat call on Windows (which is actually a long wrapper around a number of system calls on Windows), rather than the simpler and more direct call to GetFileAttributeW.

I noticed that at one point there was a version of isdir that does exactly this. At the time, this claimed a 2x speedup.

However, this C implementation of isdir was removed as part of a large set of changes in df2d4a6, and as a result, isdir got faster.

With the following benchmark:

isdir benchmark
import os.path
import timeit


for i in range(100):
    os.makedirs(f"exists{i}", exist_ok=True)


def test_exists():
    for i in range(100):
        os.path.isdir(f"exists{i}")


def test_extinct():
    for i in range(100):
        os.path.isdir(f"extinct{i}")


print(timeit.timeit(test_exists, number=100))
print(timeit.timeit(test_extinct, number=100))


for i in range(100):
    os.rmdir(f"exists{i}")

I get the following with df2d4a6:

exists: 0.18694799999957468
doesn't exist: 0.08418370000072173

and with the prior commit:

exists: 0.25393609999991895
doesn't exist: 0.08511730000009265

So, from this, I'd conclude that the idea of replacing calls to os.stat with calls to GetFileAttributeW would not bear fruit, but @zooba should probably confirm I'm benchmarking the right thing and making sense.

In any event, we should probably remove the little vestige that imports this fast path that was removed:

try:
    # The genericpath.isdir implementation uses os.stat and checks the mode
    # attribute to tell whether or not the path is a directory.
    # This is overkill on Windows - just pass the path to GetFileAttributes
    # and check the attribute from there.
    from nt import _isdir as isdir
except ImportError:
    # Use genericpath.isdir as imported above.
    pass

Linked PRs

@mdboom mdboom added type-bug An unexpected behavior, bug, or error OS-windows labels Jan 20, 2023
@zooba
Copy link
Member

zooba commented Jan 20, 2023

Yeah, this is very strange. Perhaps @eryksun has a better idea of what's happening here?

By my read, stat should definitely be slower, because CreateFileW should fail on a directory (IIRC - that might be outdated) and then we fall through to GetFileAttributesW anyway.

But yes, the native implementation is gone, so we should remove the attempt to use it from ntpath.py.

@eryksun
Copy link
Contributor

eryksun commented Jan 20, 2023

By my read, stat should definitely be slower, because CreateFileW should fail on a directory (IIRC - that might be outdated) and then we fall through to GetFileAttributesW anyway.

The CreateFileW() call uses the flag FILE_FLAG_BACKUP_SEMANTICS, which allows opening directories.

The reason that nt._isdir() was removed in 3.8 is that os.path.isdir() needs to return false if the target path of a directory symlink or mount point (junction) doesn't exist, but GetFileAttributesW() doesn't traverse reparse points1.

In principle, we could use GetFileAttributesW() as a quick check. Return false if the path doesn't have the attribute FILE_ATTRIBUTE_DIRECTORY. Else return true if it doesn't have the attribute FILE_ATTRIBUTE_REPARSE_POINT. Otherwise call CreateFileW() to traverse the reparse point, and get the FileBasicInfo via GetFileInformationByHandleEx().


Calling GetFileAttributesW() or GetFileAttributesExW() is less work than getting the file stat via CreateFileW(), GetFileInformationByHandleEx(), and GetFileInformationByHandle(). However, it's a question of relative work. In both cases the object manager, I/O manager, and one or more filesystems have to parse the opened path to get at the file/directory control block. Once the filesystem has the FCB/DCB, getting the file and volume information should be relatively fast.

A possible exception is if a redirector for a remote filesystem locally caches file information (e.g. SMB uses caching). GetFileAttributesW() calls NtQueryAttributesFile(), and GetFileAttributesExW() calls NtQueryFullAttributesFile(). These system calls are implemented, respectively, using a special query-only open that uses a dummy file object. The FileBasicInformation or FileNetworkOpenInformation is returned directly in the open packet that's passed to the object manager call ObOpenObjectByName()2. A filesystem redirector doesn't have to open the file normally for a query-only open if it already has the file basic information or network-open information in its local cache.

Footnotes

  1. GetFileAttributesW() traverses placeholder reparse points (e.g. OneDrive reparse points) if the process compatibility mode is set to disguise them. Refer to RtlSetProcessPlaceholderCompatibilityMode() and RtlQueryProcessPlaceholderCompatibilityMode(). The last time I checked, placeholders are disguised by default for a "python.exe" process. This may be because our old implementation of os.stat() incorrectly assumed that all reparse points are symbolic links instead of just name-surrogate reparse points. That behavior may have put us on a list of troublesome applications.

  2. Refer to the implementation of IopQueryAttributesFile() in ReactOS, which aims for NT 5.x compatibility.

@eryksun
Copy link
Contributor

eryksun commented Jan 20, 2023

  1. This may be because our old implementation of os.stat() incorrectly assumed that all reparse points are symbolic links instead of just name-surrogate reparse points. That behavior may have put us on a list of troublesome applications.

Actually, according to the documentation on supporting placeholders in a sync engine, most applications default to disguising placeholder reparse points. The documentation of RtlSetProcessPlaceholderCompatibilityMode() is thus incorrect when it claims that "[m]ost Windows applications see exposed placeholders by default".

The cloud files API implements the placeholder system using reparse points. A common misconception about reparse points is that they are the same as symbolic links. This misconception is occasionally reflected in application implementations, and as a result, many existing applications hit errors when encountering any reparse point.

To mitigate this compatibility issue, the cloud files API always hides its reparse points from all applications except for sync engines and processes whose main image resides under %systemroot%. Applications that understand reparse points correctly can force the platform to expose cloud files API reparse points using RtlSetProcessPlaceholderCompatibilityMode or RtlSetThreadProcessPlaceholderCompatibilityMode.

To confirm the "%SystemRoot%" behavior, I built a program that calls GetFileAttributesW() to check whether a given path is a reparse point. Sure enough, a copy of the executable in "SystemRoot%" defaults to exposing OneDrive placeholders as reparse points.

@zooba
Copy link
Member

zooba commented Jan 21, 2023

The OneDrive placeholders are... special... because I think traditional reparse points were used in an earlier implementation and it was found that they caused the MFT to fill up for some users. I suspect they virtualise all the files in a filter driver, and "generate" reparse points on the fly for anything inside the top-level directory (or possible all directories?)

Though since we install the all-users py.exe launcher into %SystemRoot%,

The CreateFileW() call uses the flag FILE_FLAG_BACKUP_SEMANTICS, which allows opening directories.

Okay, good. I thought we should've been, but glanced quickly at the sources and didn't see it. In that case, perhaps backup semantics + relative path + no reparse points work out faster than GetFileAttributes? So this particular benchmark might be the ideal case, and things balance out (hopefully don't get worse) in more complex cases?

Hopefully we get GetFileInformationByPath finalised soon (still some design discussions going on internally before it ships), and then it won't matter. That's significantly faster than either approach.

@zooba zooba added easy 3.12 bugs and security fixes labels Jan 21, 2023
@eryksun
Copy link
Contributor

eryksun commented Jan 21, 2023

So this particular benchmark might be the ideal case, and things balance out (hopefully don't get worse) in more complex cases?

I get the result that I would expect when I compare versions 3.6 to 3.11. Starting with 3.8 it became significantly slower. In the following table, each row is normalized by the version that ran the test in the shortest time:

3.6 3.7 3.8 3.9 3.10 3.11
exist 1.0 1.03 1.62 1.65 1.64 1.60
not exist 1.0 1.02 1.31 1.28 1.32 1.27

I'm not very concerned about this result because normally the work of checking thousands of files and directories would use os.scandir(), which is fast on Windows as long as the DirEntry.is_dir() method doesn't have to traverse a symlink or junction. That said, there's room for improvement in os.path.isdir() if you want to restore nt._isdir(). The new implementation would normally calls GetFileAttributesW(), except for a directory reparse point it would have to call CreateFileW() and GetFileInformationByHandleEx().


perhaps backup semantics

Using backup semantics in this case typically means that NtCreateFile() is called without the option FILE_NON_DIRECTORY_FILE, which is the option that CreateFileW() uses normally in order to disallow opening a directory. Generally this shouldn't have any affect on performance.

If SeBackupPrivilege or SeRestorePrivilege is enabled, backup semantics also affects the nature of and outcome of the discretionary access check. If both privileges are enabled, backup semantics allows ACCESS_SYSTEM_SECURITY | STANDARD_RIGHTS_ALL | GENERIC_READ | GENERIC_WRITE | GENERIC_EXECUTE discretionary access, without having to check the DACL. However, the file security still has to be checked for mandatory access control (e.g. an administrator at high integrity level cannot use backup semantics to access a file at system integrity level, which exceeds high integrity level, if the mandatory access control denies read, write, and execute access to lower levels).

  • relative path

Using a relative path avoids most of the path parsing in the object namespace and filesystem(s) because the directory is opened relative to the handle for the current working directory. However, this would be a common factor across all tested versions.

  • no reparse points work out faster than GetFileAttributes?

Yes, if the directories were symlinks or junctions, that would be to the advantage of GetFileAttributesW() since it doesn't traverse them to the target path. In such cases, the result of ntpath.isdir() is potentially wrong on Windows prior to 3.8.


The OneDrive placeholders are... special... because I think traditional reparse points were used in an earlier implementation and it was found that they caused the MFT to fill up for some users. I suspect they virtualise all the files in a filter driver, and "generate" reparse points on the fly for anything inside the top-level directory (or possible all directories?)

Either way, whether they're real or virtual reparse points, placeholders are seen as normal files by default unless either the process or thread is manually configured to expose them, or unless the executable is located in "%SystemRoot%".

Though since we install the all-users py.exe launcher into %SystemRoot%,

The "py.exe" launcher doesn't matter. What matters is the location of the process executable (e.g. "python[w].exe", "pythonservice.exe"), which is unlikely to be installed in "%SystemRoot%".

I copied "python.exe" to "%SystemRoot%", with the installation directory set in PATH. It's the same result as the test program that I tried earlier. In this case, OneDrive placeholders are exposed as reparse points by default. Yay? It's not exactly practical, but it does demonstrate that the system isn't singling out "python.exe" processes.

@mdboom mdboom self-assigned this Jan 23, 2023
@mdboom
Copy link
Contributor Author

mdboom commented Jan 23, 2023

I get the result that I would expect when I compare versions 3.6 to 3.11.

Your result is certainly less surprising than mine -- I'll have to figure out why my own benchmark runs show the opposite, but my best guess is my lack of Windows experience at the moment.

normally the work of checking thousands of files and directories would use os.scandir()

Agreed, but the use case that inspired this investigation was timing venv create (which actually spends most of its time doing pip install). pip doesn't seem to make effective use of scandir, and actually calls exists, isfile and isdir directly (not on DirEntry objects) around 8000 times, contributing to about 5% of the overall runtime.

@lpereira
Copy link
Contributor

I wonder if an API like GLib's g_file_test() could be added to os.path so that we could choose the best way of performing a path test in a particular platform. For instance, if we had something like os.path.test(path, flags), with flags being something like EXISTS, IS_DIR, IS_FILE, etc, something like this could be done:

def test(path, flags):
    methods = {
          EXISTS | NOT_IS_DIR: _test_exists_is_not_dir,
          EXISTS: _test_exists,
          (...)
    }
    return methods.get(flags, _test_fallback)(path, flags)

So, for instance, on Unix, _test_exists_is_not_dir() would call stat and check for the file mode, and _test_exists() would call access(F_OK). flags is passed to these functions because _test_fallback might want to perform each test individually.

@DefaultRyan
Copy link

I get the result that I would expect when I compare versions 3.6 to 3.11.

Your result is certainly less surprising than mine -- I'll have to figure out why my own benchmark runs show the opposite, but my best guess is my lack of Windows experience at the moment.

normally the work of checking thousands of files and directories would use os.scandir()

Agreed, but the use case that inspired this investigation was timing venv create (which actually spends most of its time doing pip install). pip doesn't seem to make effective use of scandir, and actually calls exists, isfile and isdir directly (not on DirEntry objects) around 8000 times, contributing to about 5% of the overall runtime.

I performed some more profiling on the venv create scenario, and learned that roughly 1/3rd of the stat cost can be solely attributed to the exists check in makedirs (src). (Which is being called by compile here, and from my cursory glance at that code, it should simply be passing exist_ok=True rather than catching FileExistsError, right?). What makes this especially wasteful is that this run contained 1451 calls to makedirs and 1455 calls to makedir, implying that the exists check was making things worse, rather than helping. Put another way, exists is taking nearly as much time as mkdir here.

While it's true that if/when we get GetFileInformationByPath, it should make stat calls much faster, that will only benefit the minority of users running a new version of Windows containing this new API. Therefore, it makes sense for us to improve the state of things for users running an older Windows, when practical.

And in this case, we should allow makedirs on Windows to execute in an idiomatically performant way. And that is to simply call CreateDirectoryW and selectively ignore ERROR_ALREADY_EXISTS.

@zooba
Copy link
Member

zooba commented Jan 25, 2023

nd in this case, we should allow makedirs on Windows to execute in an idiomatically performant way. And that is to simply call CreateDirectoryW and selectively ignore ERROR_ALREADY_EXISTS.

That's a reasonable (and new) suggestion. Probably deserves a new issue, as removing the comment referenced earlier is still a legitimate PR someone can create.

Eventually a majority of Windows users will have the new API. Our timescales for this stuff are long enough to allow for that. Apps can modify their algorithms for faster speedup, but the runtime doesn't have to be in such a hurry.

@eryksun
Copy link
Contributor

eryksun commented Jan 25, 2023

Here's an implementation of nt._isdir() that uses CreateFileW() and GetFileInformationByHandleEx(): FileBasicInfo.

/*[clinic input]
os._isdir
    path: path_t(allow_fd=True)
    /
Return true if the pathname refers to an existing directory.
[clinic start generated code]*/

static PyObject *
os__isdir_impl(PyObject *module, path_t *path)
/*[clinic end generated code: output=75f56f32720836cb input=4aef81031b54999b]*/
{
    HANDLE hfile;
    BOOL close_file = TRUE;
    FILE_BASIC_INFO info = {0};

    Py_BEGIN_ALLOW_THREADS
    if (path->fd != -1) {
        hfile = _Py_get_osfhandle_noraise(path->fd);
        close_file = FALSE;
    } else {
        hfile = CreateFileW(path->wide, FILE_READ_ATTRIBUTES, 0, NULL, 
                    OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, NULL); 
    }
    if (hfile != INVALID_HANDLE_VALUE) { 
        GetFileInformationByHandleEx(hfile, FileBasicInfo, &info, 
            sizeof(info));
        if (close_file) {
            CloseHandle(hfile);
        }
    }
    Py_END_ALLOW_THREADS

    if (info.FileAttributes & FILE_ATTRIBUTE_DIRECTORY) {
        Py_RETURN_TRUE;
    } else {
        Py_RETURN_FALSE;
    }
}

Remember to run clinic and add OS__ISDIR_METHODDEF to the posix_methods array.

This version supports testing a file descriptor, since we've had that capability on Windows since 3.8.

It isn't a drop-in replacement for genericpath.isdir() because the path_t argument parsing can raise ValueError due to a bad argument (e.g. an embedded null character or a bytes decoding error).

For example:

>>> import os
>>> os.path.isdir
<built-in function _isdir>
>>> os.path.isdir(0)
False
>>> O_OBTAIN_DIR = 0x2000
>>> fd = os.open('C:\\', O_OBTAIN_DIR)
>>> os.path.isdir(fd)
True
>>> os.close(fd)
>>> os.path.isdir(fd)
False
>>> os.symlink('spamdir', 'spamlink', target_is_directory=True)
>>> os.path.isdir('spamlink')
False
>>> os.mkdir('spamdir')
>>> os.path.isdir('spamlink')
True
>>> os.rmdir('spamdir')
>>> os.path.isdir('spamlink')
False
>>> genericpath.isdir('spam\0')
False
>>> os.path.isdir('spam\0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: _isdir: embedded null character in path

For me, this implementation takes about about a third less time than using os.stat(). It takes about 10% more time than using GetFileAttributesW(), but using GetFileAttributesW() is significantly more expensive for a reparse point (e.g. symlink, junction) because CreateFileW() has to be called to traverse it, which means the file is opened and queried twice.

A similar function could be implemented for exists() and isfile(), but implementing a flexible function like os.statx() is a better idea.


A new GetFileInformationByName() function is planned for Windows 11. It's probably based on NtQueryInformationByName(), or a new system call that's similar.

Somehow NtQueryInformationByName() is significantly faster than NtQueryAttributesFile(), even though both are implemented as query-only opens, and even though the former returns much more information in the FILE_STAT_INFORMATION record. Of the two, only NtQueryInformationByName() is flagged as a filter-query open, which suggests that a filter driver may be a factor in the speedup.

I compared the performance of nt._isdir when it's implemented directly with NtQueryAttributesFile() vs NtQueryInformationByName(). Relative to the above CreateFileW() implementation, the NtQueryAttributesFile() version was a bit faster for regular directories, while the NtQueryInformationByName() version executed in half the time for regular directories.

These functions always open a reparse point, except for disguised placeholders. Traversing a reparse point requires a normal open such as CreateFileW(), or NTAPI NtCreateFile() or NtOpenFile(). Also, currently NtQueryInformationByName() isn't supported by FAT filesystems such as exFAT and FAT32. So maybe there's still a case for a flexible statx() implementation to improve the performance when traversing reparse points or with file systems other than NTFS.

@mdboom
Copy link
Contributor Author

mdboom commented Jan 25, 2023

@eryksun: I agree that some form of this optimized isdir based on the "legacy" Windows APIs makes sense for now. I was working on that, and I plan to merge it with your post above probably sometime today or tomorrow.

And in this case, we should allow makedirs on Windows to execute in an idiomatically performant way. And that is to simply call CreateDirectoryW and selectively ignore ERROR_ALREADY_EXISTS.

@DefaultRyan: I agree we should also do this (and maybe there are other instances of this elsewhere). Do you want to file a separate issue for that?

@mdboom mdboom changed the title Remove obsolete reference to faster os.path.isdir on Windows Add optimized versions of isdir / isfile on Windows Jan 25, 2023
@eryksun
Copy link
Contributor

eryksun commented Jan 25, 2023

In os.makedirs() the purpose of the exists() call is to determine whether it needs to recur. For os.makedirs(r'C:\d1\d2\...\dn'), it shouldn't recur all the way to the root directory to try to create an arbitrary number of directories that already exist. It could be redesigned to recur based on FileNotFoundError. That would eliminate the exists() check, but it still depends on a relatively expensive NtCreateFile() system call for the CreateDirectoryW() call, which is the most significant cost in os.stat(). Also, a cross-platform implementation still needs the isdir() check when exist_ok is true because the error code from mkdir() isn't consistent.

@zooba zooba added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error easy labels Jan 25, 2023
@DefaultRyan
Copy link

I performed some more profiling on the venv create scenario, and learned that roughly 1/3rd of the stat cost can be solely attributed to the exists check in makedirs (src). (Which is being called by compile here, and from my cursory glance at that code, it should simply be passing exist_ok=True rather than catching FileExistsError, right?). What makes this especially wasteful is that this run contained 1451 calls to makedirs and 1455 calls to makedir, implying that the exists check was making things worse, rather than helping. Put another way, exists is taking nearly as much time as mkdir here.

While it's true that if/when we get GetFileInformationByPath, it should make stat calls much faster, that will only benefit the minority of users running a new version of Windows containing this new API. Therefore, it makes sense for us to improve the state of things for users running an older Windows, when practical.

And in this case, we should allow makedirs on Windows to execute in an idiomatically performant way. And that is to simply call CreateDirectoryW and selectively ignore ERROR_ALREADY_EXISTS.

@mdboom and @zooba I've created the new issue here: #101358.

zooba pushed a commit that referenced this issue Feb 8, 2023
carljm added a commit to carljm/cpython that referenced this issue Feb 9, 2023
* main: (82 commits)
  pythongh-101670: typo fix in PyImport_ExtendInittab() (python#101723)
  pythonGH-99293: Document that `Py_TPFLAGS_VALID_VERSION_TAG` shouldn't be used. (#pythonGH-101736)
  no-issue: Add Dong-hee Na as the cjkcodecs codeowner (pythongh-101731)
  pythongh-101678: Merge math_1_to_whatever() and math_1() (python#101730)
  pythongh-101678: refactor the math module to use special functions from c11 (pythonGH-101679)
  pythongh-85984: Remove legacy Lib/pty.py code. (python#92365)
  pythongh-98831: Use opcode metadata for stack_effect() (python#101704)
  pythongh-101283: Version was just released, so should be changed in 3.11.3 (pythonGH-101719)
  pythongh-101283: Fix use of unbound variable (pythonGH-101712)
  pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286)
  pythongh-101277: Port more itertools static types to heap types (python#101304)
  pythongh-98831: Modernize CALL and family (python#101508)
  pythonGH-101696: invalidate type version tag in `_PyStaticType_Dealloc` (python#101697)
  pythongh-100221: Fix creating dirs in `make sharedinstall` (pythonGH-100329)
  pythongh-101670: typo fix in PyImport_AppendInittab() (pythonGH-101672)
  pythongh-101196: Make isdir/isfile/exists faster on Windows (pythonGH-101324)
  pythongh-101614: Don't treat python3_d.dll as a Python DLL when checking extension modules for incompatibility (pythonGH-101615)
  pythongh-100933: Improve `check_element` helper in `test_xml_etree` (python#100934)
  pythonGH-101578: Normalize the current exception (pythonGH-101607)
  pythongh-47937: Note that Popen attributes are read-only (python#93070)
  ...
@hauntsaninja
Copy link
Contributor

It looks like this has been completed? The discussion on makedirs has a dedicated issue now. There was some discussion on the PR about whether to backport, but it's rare for us to backport a performance improvement (particularly so when the merge isn't trivial)

@zooba
Copy link
Member

zooba commented Feb 27, 2023

Yeah, just forgot to hit close.

barneygale added a commit to barneygale/cpython that referenced this issue Apr 24, 2024
…`is_*()`

Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to pythonGH-101196.
barneygale added a commit that referenced this issue May 14, 2024
…)` (#118243)

Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to GH-101196.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…`is_*()` (python#118243)

Suppress all `OSError` exceptions from `pathlib.Path.exists()` and `is_*()`
rather than a selection of more common errors as we do presently. Also
adjust the implementations to call `os.path.exists()` etc, which are much
faster on Windows thanks to pythonGH-101196.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes OS-windows type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants