-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boost subshell performance by avoiding expensive pwd related syscalls #805
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In ksh93v- a sh.pwdfd variable was introduced, ostensibly for implementing 'cd -f', 'pwd -f' and '${.sh.pwdfd}' as new features. I might backport these in the future, but those aren't the focus of this commit. Rather, what caught my interest was the effect of this change on subshell performance (results from shbench with 20 iterations, compiled with CCFLAGS='-O2 -fPIC'): ---------------------------------------------------------------------------------------------------------------------------------- name /tmp/ksh93u+2012-08-01 /tmp/ksh93u+m-dev /tmp/ksh93v-2012-08-24 /tmp/ksh93v-2014-12-24 /tmp/ksh93u+redhat ---------------------------------------------------------------------------------------------------------------------------------- fibonacci.ksh 0.230 [0.227-0.235] 0.216 [0.211-0.222] 0.269 [0.266-0.273] 0.247 [0.242-0.251] 0.264 [0.260-0.269] parens.ksh 0.208 [0.204-0.212] 0.173 [0.170-0.182] 0.147 [0.145-0.149] 0.125 [0.121-0.127] 0.140 [0.137-0.144] ---------------------------------------------------------------------------------------------------------------------------------- As can be seen above, 93u+ and 93u+m perform worse in the parens benchmark than the ksh93v- release which implements the change, as well as a copy of ksh93u+ compiled with Red Hat's patches for cd (fdstatus, rmdirfix, cdfix and cdfix2 applied in that order). However, 93u+ and 93u+m perform better in the fibonacci benchmark. 93v- is faster in the parens benchmark because it obtains the file descriptor for the current working directory via openat(2) in sh_diropenat, then caches it in sh.pwdfd. This fd is then reused in sh_subshell(), which allows 93v- to avoid taxing open(2) and fcntl(2) syscalls. However, the 93v- implementation has a flaw. Unlike in 93u+, it always duplicates sh.pwdfd with fcntl, even when it's a superfluous operation. This is the cause of the performance regression in the fibonacci benchmark. To increase ksh93u+m's performance beyond both 93u+ and 93v-, I have backported 93v-'s code, but with a crucial change: - sh_subshell() no longer duplicates sh.pwdfd with fcntl during virtual subshell scope initialization. Rather, a sh_pwdupdate function is used in tandem with sh_diropenat in b_cd() to do all of this when the directory is being changed. This fixes the regression in the fibonacci benchmark. sh_pwdupdate will be called whenever sh.pwdfd changes to prevent file descriptor leaks while also avoiding erroneous close operations on the parent shell's file descriptor for the PWD. Other changes applied include: - The removal of support for systems which lack fchdir. ksh93v- did this twelve years ago in the oldest archived alpha release, which I think is a more than long enough deprecation window. - If sh.pwdfd is -1 (i.e., the shell was initialized in a nonexistent directory), don't fork preemptively. It's more efficient to let cd check sp->pwdfd via a short function before attempting to change the directory. This allows subshells run in nonexistent directories to avoid forking, provided cd is never invoked. - Removed <=1995 hackery from nv_open and env_init. During SH_INIT we should simply always invoke path_pwd(); giving $PWD NV_TAGGED is not just unnecessary, but possibly harmful when one attempts to use 'typeset -t' with the PWD variable. - With the removal of that code, sh.pwd is guaranteed to get set during shell initialization (if it wasn't already). As such, the code no longer needs to cope with a NULL pwd and many such checks have been removed. - sh.pwd no longer has a pointless and misleading const modifier. This allows for the elimination of many casts involving sh.pwd. - sh.pwdfd is reset every time the pwd is (re)initialized in path_pwd(). - libast/features/fcntl.c: Account for systems which don't implement O_SEARCH and/or O_DIRECTORY. (I avoided backporting the ksh93v- feature test code because it's ginormous and filled with far too much compatibility hackery. If anyone wants to backport the mess that are the libast syscall intercepts, be my guest. In any case ksh doesn't need any of that to function correctly.) - Goat sacrifices are hereby banned from 93u+m (for the curious, please see att#567). With these changes ksh93u+m now handily beats the previous ksh releases in the fibonacci and parens benchmarks (ksh2020 included for comparison): ---------------------------------------------------------------------------------------------------------------------------------- name /tmp/ksh93u+2012-08-01 /tmp/ksh93v-2014-12-24 /tmp/ksh2020 /tmp/ksh93u+m-dev /tmp/ksh93u+m-patch ---------------------------------------------------------------------------------------------------------------------------------- fibonacci.ksh 0.230 [0.224-0.236] 0.247 [0.244-0.252] 0.369 [0.364-0.375] 0.217 [0.213-0.222] 0.202 [0.198-0.206] parens.ksh 0.207 [0.203-0.213] 0.124 [0.121-0.127] 0.165 [0.162-0.169] 0.173 [0.171-0.176] 0.082 [0.080-0.085] ----------------------------------------------------------------------------------------------------------------------------------
Thanks for this PR, this is great. On macOS (12.7.6 M1/arm64), the performance improvement on the parens.ksh benchmark is quite spectacular. Columns are: 93u+m 2012-08-01; ksh2020; dev branch without this PR; dev branch with this PR.
|
The PR looks good to me. I don't know if it's all going to work on the more obscure systems like QNX or AIX, but that is more easily tested after merging it. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In ksh93v- a
sh.pwdfd
variable was introduced, ostensibly for implementingcd -f
,pwd -f
and${.sh.pwdfd}
as new features. I might backport these in the future, but those aren't the focus of this commit. Rather, what caught my interest was the effect of this change on subshell performance (results fromshbench
with 20 iterations, compiled withCCFLAGS='-O2 -fPIC'
):As can be seen above, 93u+ and 93u+m perform worse in the parens benchmark than the ksh93v- release which implements the change, as well as a copy of ksh93u+ compiled with Red Hat's patches for cd (fdstatus, rmdirfix, cdfix and cdfix2 applied in that order). However, 93u+ and 93u+m perform better in the fibonacci benchmark.
93v- is faster in the parens benchmark because it obtains the file descriptor for the current working directory via
openat(2)
insh_diropenat()
, then caches it in sh.pwdfd. This fd is then reused insh_subshell()
, which allows 93v- to avoid taxingopen(2)
andfcntl(2)
syscalls.However, the 93v- implementation has a flaw. Unlike in 93u+, it always duplicates
sh.pwdfd
withfcntl(2)
, even when it's a superfluous operation. This is the cause of the performance regression in the fibonacci benchmark.To increase ksh93u+m's performance beyond both 93u+ and 93v-, I have backported 93v-'s code, but with a crucial change:
sh_subshell()
no longer duplicatessh.pwdfd
withfcntl()
during virtual subshell scope initialization. Rather, ash_pwdupdate()
function is used in tandem withsh_diropenat()
inb_cd()
to do all of this when the directory is being changed. This fixes the regression in the fibonacci benchmark.sh_pwdupdate()
will be called wheneversh.pwdfd
changes to prevent file descriptor leaks while also avoiding erroneous close operations on the parent shell's file descriptor for the PWD.Other changes applied include:
fchdir(2)
. ksh93v- did this twelve years ago in the oldest archived alpha release, which I think is a more than long enough deprecation window.sh.pwdfd
is -1 (i.e., the shell was initialized in a nonexistent directory), don't fork preemptively. It's more efficient to letcd
checksp->pwdfd
via a short function (sh_validate_subpwdfd()
) before attempting to change the directory. This allows subshells run in nonexistent directories to avoid forking, providedcd
is never invoked.nv_open()
andenv_init()
DuringSH_INIT
we should simply always invokepath_pwd()
; giving$PWD
NV_TAGGED
is not just unnecessary, but possibly harmful when one attempts to usetypeset -t
with thePWD
variable.sh.pwd
is guaranteed to get set during shell initialization (if it wasn't already). As such, the code no longer needs to cope with a NULL pwd and many such checks have been removed.sh.pwd
no longer has a pointless and misleadingconst
modifier. This allows for the elimination of many casts involving it.sh.pwdfd
is reset every time the pwd is (re)initialized inpath_pwd()
.O_SEARCH
and/orO_DIRECTORY
. (I avoided backporting the ksh93v- feature test code because it's ginormous and filled with far too much compatibility hackery. If anyone wants to backport the mess that are the libast syscall intercepts, be my guest. In any case ksh doesn't need any of that to function correctly.)With these changes ksh93u+m now handily beats the previous ksh releases in the fibonacci and parens benchmarks (ksh2020 included for comparison):