-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spawnWindows
: Improve worst-case performance considerably + tests
#13993
Conversation
Seems to happen if the command trying to be executed has the extension .exe and it's an invalid executable.
The name of the game here is to avoid CreateProcessW calls at all costs, and only ever try calling it when we have a real candidate for execution. Secondarily, we want to minimize the number of syscalls used when checking for each PATHEXT-appended version of the app name. An overview of the technique used: - Open the search directory for iteration (either cwd or a path from PATH) - Use NtQueryDirectoryFile with a wildcard filename of `<app name>*` to check if anything that could possibly match either the unappended version of the app name or any of the versions with a PATHEXT value appended exists. - If the wildcard NtQueryDirectoryFile call found nothing, we can exit early without needing to use PATHEXT at all. This allows us to use a <open dir, NtQueryDirectoryFile, close dir> sequence for any directory that doesn't contain any possible matches, instead of having to use a separate look up for each individual filename combination (unappended + each PATHEXT appended). For directories where the wildcard *does* match something, we only need to do a maximum of <number of supported PATHEXT extensions> more NtQueryDirectoryFile calls. --- In addition, we now only evaluate the extensions in PATHEXT that we know we can handle (.COM, .EXE, .BAT, .CMD) and ignore the rest. --- This commit also makes two edge cases match Windows behavior: - If an app name has the extension .exe and it is attempted to be executed, that is now treated as unrecoverable and InvalidExe is immediately returned no matter where the .exe is (cwd or in the PATH). This matches the behavior of the Windows cmd.exe. - If the app name contains more than just a filename (e.g. it has path separators), then it is excluded from PATH searching and only does a cwd search. This matches the behavior of Windows cmd.exe.
An addendum about some tradeoffs: Out of curiosity, I tried comparing against a Rust version of the benchmark, and I noticed that the Rust version performs considerably better (this is the benchmark with 211 search paths and the command being found after 197 searched paths):
In looking into why this is, I noticed that Rust makes some tradeoffs in terms of matching Windows behavior and what it thinks is the most likely scenario:
So, my conclusion is that performance could be increased if these sorts of tradeoffs are acceptable, but I think I personally prefer matching the Windows behavior even if it comes at a cost. For completeness, here's the benchmark with only the directory that the command is found in on the
|
Tests a decent amount of edge cases dealing with how PATH and PATHEXT searching is handled.
2c46323
to
0cbc59f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits.
@@ -1126,6 +1351,64 @@ fn windowsCreateProcess(app_name: [*:0]u16, cmd_line: [*:0]u16, envp_ptr: ?[*]u1 | |||
); | |||
} | |||
|
|||
/// Case-insenstive UTF-16 lookup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The origin/how these known extensions are given of this table should be included here.
@@ -1094,6 +1090,235 @@ pub const ChildProcess = struct { | |||
} | |||
}; | |||
|
|||
/// Expects `app_buf` to contain exactly the app name, and `dir_buf` to contain exactly the dir path. | |||
/// After return, `app_buf` will always contain exactly the app name and `dir_buf` will always contain exactly the dir path. | |||
/// Note: `app_buf` should not contain any leading path separators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try app_buf.append(allocator, 0); | ||
const app_name_wildcard = app_buf.items[0 .. app_buf.items.len - 1 :0]; | ||
|
||
// Enough for the FILE_DIRECTORY_INFORMATION + (NAME_MAX UTF-16 code units [2 bytes each]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesnt explain why 2*NAME_MAX. context https://github.com/ziglang/zig/blob/master/lib/std/os/windows.zig#L3142
Aside, has very thorough test coverage. |
If the app name is absolute (e.g.
Not sure exactly what you're asking, but I'll try to clarify how this PR will impact normal use cases:
If you're asking about if there would be a benefit to taking some of the sorts of tradeoffs that Rust makes, they would have some effect, but again (in terms of 'normal' use) probably only in the situations in the two points above. The most noticeable differences are in the outlier cases where there are a ton of entries in the The usage of the wildcard with |
…nsion Previously, the implementation would essentially check `startsWith` instead of `eql` (e.g. it would return true for `.exec` because it erroneously 'matched' `.exe`). Follow up to ziglang#13993
…nsion Previously, the implementation would essentially check `startsWith` instead of `eql` (e.g. it would return true for `.exec` because it erroneously 'matched' `.exe`). Follow up to #13993
This fixes a regression caused by ziglang#13993 As an optimization, the first call to `NtQueryDirectoryFile` would only ask for a single result and assume that if the result returned did not match the app_name exactly, then the unappended app_name did not exist. However, this relied on the assumption that the unappended app_name would always be returned first, but that only seems to be the same on NTFS. On FAT filesystems, the order of returned files can be different, which meant that it could assume the unappended file doesn't exist when it actually does. This commit fixes that by fully iterating the wildcard matches via `NtQueryDirectoryFile` and taking note of any unappended/PATHEXT-appended filenames it finds. In practice, this strategy does not introduce a speed regression compared to the previous (buggy) implementation. Benchmark 1 (10 runs): winpathbench-master.exe measurement mean ± σ min … max outliers delta wall_time 508ms ± 4.08ms 502ms … 517ms 1 (10%) 0% peak_rss 3.62MB ± 2.76KB 3.62MB … 3.63MB 0 ( 0%) 0% Benchmark 2 (10 runs): winpathbench-fat32-fix.exe measurement mean ± σ min … max outliers delta wall_time 500ms ± 21.4ms 480ms … 535ms 0 ( 0%) - 1.5% ± 2.8% peak_rss 3.62MB ± 2.76KB 3.62MB … 3.63MB 0 ( 0%) - 0.0% ± 0.1% --- Partially addresses ziglang#16374 (it fixes `zig build` on FAT32 when no `zig-cache` is present)
This fixes a regression caused by ziglang#13993 As an optimization, the first call to `NtQueryDirectoryFile` would only ask for a single result and assume that if the result returned did not match the app_name exactly, then the unappended app_name did not exist. However, this relied on the assumption that the unappended app_name would always be returned first, but that only seems to be the case on NTFS. On FAT filesystems, the order of returned files can be different, which meant that it could assume the unappended file doesn't exist when it actually does. This commit fixes that by fully iterating the wildcard matches via `NtQueryDirectoryFile` and taking note of any unappended/PATHEXT-appended filenames it finds. In practice, this strategy does not introduce a speed regression compared to the previous (buggy) implementation. Benchmark 1 (10 runs): winpathbench-master.exe measurement mean ± σ min … max outliers delta wall_time 508ms ± 4.08ms 502ms … 517ms 1 (10%) 0% peak_rss 3.62MB ± 2.76KB 3.62MB … 3.63MB 0 ( 0%) 0% Benchmark 2 (10 runs): winpathbench-fat32-fix.exe measurement mean ± σ min … max outliers delta wall_time 500ms ± 21.4ms 480ms … 535ms 0 ( 0%) - 1.5% ± 2.8% peak_rss 3.62MB ± 2.76KB 3.62MB … 3.63MB 0 ( 0%) - 0.0% ± 0.1% --- Partially addresses ziglang#16374 (it fixes `zig build` on FAT32 when no `zig-cache` is present)
This fixes a regression caused by #13993 As an optimization, the first call to `NtQueryDirectoryFile` would only ask for a single result and assume that if the result returned did not match the app_name exactly, then the unappended app_name did not exist. However, this relied on the assumption that the unappended app_name would always be returned first, but that only seems to be the case on NTFS. On FAT filesystems, the order of returned files can be different, which meant that it could assume the unappended file doesn't exist when it actually does. This commit fixes that by fully iterating the wildcard matches via `NtQueryDirectoryFile` and taking note of any unappended/PATHEXT-appended filenames it finds. In practice, this strategy does not introduce a speed regression compared to the previous (buggy) implementation. Benchmark 1 (10 runs): winpathbench-master.exe measurement mean ± σ min … max outliers delta wall_time 508ms ± 4.08ms 502ms … 517ms 1 (10%) 0% peak_rss 3.62MB ± 2.76KB 3.62MB … 3.63MB 0 ( 0%) 0% Benchmark 2 (10 runs): winpathbench-fat32-fix.exe measurement mean ± σ min … max outliers delta wall_time 500ms ± 21.4ms 480ms … 535ms 0 ( 0%) - 1.5% ± 2.8% peak_rss 3.62MB ± 2.76KB 3.62MB … 3.63MB 0 ( 0%) - 0.0% ± 0.1% --- Partially addresses #16374 (it fixes `zig build` on FAT32 when no `zig-cache` is present)
Follow up to #13983.
First, the eye-popping benchmark:
Benchmark code
Compiled with
-lc -OReleaseFast
:The .bat file for comparison:
However, this is not representative, and in fact I think it's basically impossible to get a 'representative' benchmark, as the benchmark mostly depends on the number of search paths in the
PATH
env var that must be searched before finding the command, which varies hugely (order of PATH, number of values in PATH, the particular command, etc).The benchmark above is using a
PATH
with 211 entries (artificially inflated), and it finds the command after searching through 197 of them. Here's a benchmark with only the directory that the command is found in on thePATH
(so it only checks the cwd and then the PATH entry where it finds it):Still slightly faster, but it should demonstrate that this PR greatly improves the worst-case performance but not necessarily the performance of the happy path.
That out of the way, here's some details:
The name of the game here is to avoid CreateProcessW calls at all costs, and only ever try calling it when we have a real candidate for execution. Secondarily, we want to minimize the number of syscalls used when checking for each PATHEXT-appended version of the app name.
An overview of the technique used (it was inspired by running
NtTrace
on thewinpathbench.bat
file and finding that this is what it was doing):<app name>*
to check if anything that could possibly match either the unappended version of the app name or any of the versions with a PATHEXT value appended exists.This allows us to use a <open dir, NtQueryDirectoryFile, close dir> sequence for any directory that doesn't contain any possible matches, instead of having to use a separate look up for each individual filename combination (unappended + each PATHEXT appended). For directories where the wildcard does match something, we only need to do a maximum of
<number of supported PATHEXT extensions>
more NtQueryDirectoryFile calls.In addition, we now only evaluate the extensions in PATHEXT that we know we can handle (.COM, .EXE, .BAT, .CMD) and ignore the rest (see #13983 (comment) for more details about this).
This also adds a standalone test and makes two edge cases match Windows behavior: