-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace awk for perl in the bash history widget #3313
replace awk for perl in the bash history widget #3313
Conversation
While awk is POSIX, perl isn't pre-installed on all *nix flavors. This commit replaces perl with awk to eliminate the dependency on perl. Related: junegunn#3295, junegunn#3309, junegunn#3310. Test suite passed: * `make error` all test sections 'PASS' * `make docker-test` 215 runs, 1884 assertions, 0 failures, 0 errors, 0 skips. Manually tested in the following environments: * Linux amd64 with bash 3.2, 4.4, 5.2; gawk -P, one true awk, mawk, busybox awk. * macOS Catalina, bash 3.2, macOS awk 20070501. **Performance comparison:** Mawk turned out the fastest, then perl. One true awk's implementation should be the closest to macOS awk. Test data: 230 KB history, 15102 entries, including multi-line and duplicates. Linux, bash 4.4. Times in milliseconds. | Command | Mean | Min | Max | Relative | | :--- | ---: | ---: | ---: | -------: | | `mawk 1.3.4` | 22.9 | 22.3 | 25.6 | **1.00** | | `perl 5.26.1` | 34.3 | 33.6 | 35.1 | 1.49 | | `one true awk 20221215` | 41.9 | 40.6 | 46.3 | 1.83 | | `gawk 5.1.0` | 46.1 | 44.4 | 50.3 | 2.01 | | `busybox awk 1.27.0` | 64.8 | 63.2 | 70.0 | 2.82 | **Other Notes** A bug affects bash, which fails restoring a saved multi-line history entry as a single entry. Bug fixed in version 5.0.[^1] While developing this PR I discovered two unsubmitted issues affecting the current perl script. The output stream ends with `$'\n\0000'` instead of `$'\0000'`. Because of this, the script does not deduplicate a duplicated entry located at the end of the history list; therefore fzf displays two identical (not necessarily adjacent) entries. A minor point about the first issue is that the top fzf entry ends with a dangling line feed symbol, which is visible in the terminal. [^1]: https://github.com/bminor/bash/blob/ec8113b9861375e4e17b3307372569d429dec814/CHANGES#L1511 To enable: `shopt -s cmdhist lithist; HISTTIMEFORMAT='%F %T '`.
Alright, take 3... action! |
I do get the same number of entries. Below, left is perl, right is awk:
Here both versions show the same set of numbers and entries. Your bash version? MacOS? I'm testing on Linux mainly using bash 4.4 but after your comment I tested again with bash 5.2. What's your history setup in .bashrc? Mine is shopt -sq histverify histreedit
shopt -s cmdhist lithist
HISTTIMEFORMAT='%F %T '
HISTSIZE=9999999999 HISTFILESIZE=9999999999 At the moment, I don't have a hypothesis about what's going on. Do you? Anyway, in summary, the two scripts appear to deduplicate differently, so I will scrutinize and stress that part with a larger data set. Earlier I wrote:
I'll take that back. The awk script is also storing all unique records as array keys! That could conceivably hit a size limit. I'm going to look at One True Awk's source code... No, array keys don't seem to be a limiting factor. MacOS awk can create an array of 10 million 32-byte keys without errors. Later... I still can't reproduce the difference you're seeing. I have tested with 300K entries. fzf bottom: latest entries - perl (left) - one true awk (right)fzf top: earliest entries - perl (left) - one true awk (right)There's only one difference: entry number 1. That's expected as explained in the Other Notes section of my message |
Okay, here's what I've found. I've noticed awk is printing an error message because of a multi-byte character in an entry it cannot process.
(redacted the command) And fzf only receives commands after the one with the error.
95534 is the first command listed. |
Good, at least now I've got a concrete case I can work on. |
Google found several references to that error message. Please try this change; add if command -v iconv > /dev/null; then iconv -c -t utf-8; else cat; fi | immediately after line 61 |
Yep, that fixes the problem. I hope this was faster though as the difference is quite noticeable. master branch (with perl)
with awk
|
Good. Thanks.
Yes, it is. I wonder which is the bottleneck, As I mentioned, the net shows several reports of that same error message. This one1 in particular was very informative and led to the suggested change. Footnotes
|
Updated benchmarks for 300,000 entries deduplicated down to 3413Linux amd64, milliseconds, all
300,000 unique entries (no deduplication)Linux amd64, milliseconds, all
|
Thanks for the report. Unfortunately, as mentioned above, I'm observing a much larger performance difference on macOS (4 times slower; 100ms is barely noticeable, but 400ms is not). So the problem is, Mac users who already have Perl installed on their system by default will benefit nothing from this change, but they'll only experience a noticeable lag every time they use CTRL-R binding. A small number of users using minimal systems will benefit from this, but the question is, can't they just install Perl if they really want this? Is it impossible? Also, the code is a bit longer, which is a minus from the maintainer's standpoint. Even if we're going to do this, we should keep the original Perl version. if command -v perl > /dev/null; then
__fzf_history__() {
...
}
else
...
fi |
Turning perl into an optional dependency like your example would be helpful for me. When I install fzf through my package manager it has to download a 300MB+ perl package even though I don't need it since my shell is zsh (and I currently work around this by removing the perl package manually, but it'd be nice if perl could be an optional dep for packaging's sake. |
@junegunn, I understand your viewpoint: macOS users will not gain anything from this PR because perl is native in macOS, and it is noticeably faster than awk to assist fzf bash I will add that you positively represent the segment of fzf users who will feel the pain: macOS user, bash¹ user, fast¹ machine, huge history file. You already have, and you will, notice a significant slow down every time you press Other segments, like bash users with a medium size history file, will probably not notice a difference, more so on Linux. If this PR won't be merged, I still can package it for my distribution and fix my use case. (I will have to deal with users who upgrade fzf from this repo - overwriting the patched script - but I guess those will be power users who already installed perl to begin with). |
Another approach (which might have been discussed in one of these threads) is to code this up in |
Replied #3077 (comment) |
No. Making CTRL-R binding which is already pretty fast even faster by increasing the complexity of the core program can never be a goal. I'm suggesting we shouldn't make it noticeably slower. Regardless of Perl or awk, the implementation of CTRL-R binding should be just a few lines of code. Why can't you just copy the code to a system that doesn't have Perl installed? Having said that, I commented above that I'm willing to accept this version if we keep the original version and keep using it on systems that have Perl installed. |
Oh, I had interpreted your comment differently, therefore froze this PR. I can certainly look into keeping the existing perl script and only invoking the proposed awk script if perl isn't installed. |
@junegunn, I'm working to update this PR. Given that mawk is the fastest awk for this use case, and that mawk isn't POSIX, I'm thinking to change |
@step- Not a fan of adding configuration knobs. Why don't we just check if it's on the system when loading the script? if command -v perl > /dev/null; then
# Use the current version
else
# __ prefix for private variables
local __fzf_awk=awk
command -v mawk > /dev/null && __fzf_awk=mawk
# Use awk version referring to $__fzf_awk
fi Is there any reason one would prefer awk over mawk when both are available? |
I didn't realize that the load script sets some shell variables prefixed by if command -v perl > /dev/null; then
__fzf_history__() {
...
}
else # awk
local __fzf_awk=awk
command -v mawk > /dev/null && __fzf_awk=mawk
__fzf_history__() {
...
$__fzf_awk "$script"
...
}
fi
Yes, if their mawk is old, which I have often encountered in other projects, and they're unwilling to upgrade it for whatever reason. That's why I proposed a configuration variable |
Can we suggest |
For the whole shell session? No. mawk scripts and command line options aren't compatible with POSIX awk, busybox awk and gawk. |
I see. What about checking the version of mawk and see if it meets the criteria? If this is getting too complicated, I would just support awk only and tell the users to install Perl if they're concerned about the performance. |
I checked mawk source files to see how it prints the version number.
The following command verifies that at least version 1.3.4 is installed mawk --version | awk 'NR == 1 { split($2, a, "."); v=(a[1]*1000000+ a[2]*1000+ a[3]*1); exit !(v >= 1003004) }' All together: if command -v perl > /dev/null; then
__fzf_history__() {
...
}
else # awk
local __fzf_awk=awk
# if available, mawk is faster
command -v mawk > /dev/null && # at least version 1.3.4
mawk --version | awk 'NR == 1 { split($2, a, "."); v=(a[1]*1000000+ a[2]*1000+ a[3]*1); exit !(v >= 1003004) }' &&
__fzf_awk=mawk
__fzf_history__() {
...
$__fzf_awk "$script"
...
}
fi |
SGTM. Since this involves running two extra commands during loading, we can do it later when the user first runs the awk version of CTRL-R. __fzf_history__() {
if [[ -z $__fzf_awk ]]; then
__fzf_awk=awk
# Check for mawk
fi
...
} |
Good. Then I'll do the changes and push the update today.
This also allows a savvy user to preset Side note: I'm about to conclude a journey of pull requests on this topic that started back in the Spring. I feel like an airplane pilot who flies over the ocean mostly by himself then, for the final landing, proceeds to a safe landing in the watchful hands of the Control tower :) |
Thank you for your efforts and patience. I have installed mawk and tested the awk version and I can confirm that it's fast enough for me. |
Thank you. |
Merged, thanks again! |
@step- Nice work, but is the version check: IMO that belongs into package management (i.e. the The dynamic check at runtime just costs an additional Apart from that, 1.3.4 is from 2009... I mean except for museums, is anyone even still shipping previous versions? Looking at https://repology.org/project/mawk/versions, there seem only very few that have <= 1.3.3, most notably Debian buster (which is out of regular support and LTS ends 2024 (but it's version of |
@calestyo Thank you.
Yes.
The dynamic check only happens once, when .bashrc loads.
My distro is. I don't know about all other distributions. I think there are too many to be able to tell.
|
@step- One thing I've just noted: On Debian (mawk 1.3.4.20200120-3.1):
Instead:
with:
|
And it's apparently not enough to just |
Good catch, thank you. This can be fixed, with some attention for details:
@junegunn do you want me to push a commit to this branch or do you prefer a new PR referencing this comment?
I'm sorry, in which context? I can't see a calestyo reported:
|
A new one.
|
Well, as said previously... I anyway think it's a wrong approach to do software version checks at runtime in such a case.
That was with Debian stable (i.e. 12, bookworm) and the package version is 1.3.4.20200120-3.1. As of the version in Debian unstable (
btw: If you adapt the code, can you make it so that at: Lines 75 to 78 in 488a236
any empty or better said non-matching input also causes the test to fail? The reason is that as part of #3458 I'm preparing a commit that would change the check about like so: command -v mawk > /dev/null &&
command mawk --version | # at least 1.3.4
command awk 'NR == 1 { split($2, a, "."); v=(a[1]*1000000+ a[2]*1000+ a[3]*1); exit !(v >= 1003004) }' &&
__fzf_awk='command mawk' That has the following consequences:
That's btw also what I've meant with |
What exactly do you mean? |
I mean I thought you could please confirm that on Debian stable (i.e. 12, bookworm) and package version 1.3.4.20200120-3.1 the mawk script works with fzf bash history. I don't have access to bookworm. If perl is installed you will need to comment it out temporarily. As regards your request about when mawk doesn't exist: bash -c "command mawk-not-exist -W version 2> /dev/null | awk 'NR==1{ } END{exit !(v>1)}'"; echo $? 1 bash -c "command mawk -W version 2> /dev/null | awk 'NR==1{v=2; exit} END{exit !(v>1)}'"; echo $? 0 |
AFAICS, it works (apart from printing the btw, one further idea:
That seems to work (I'm not really good in Though - and just for keeping the commits more focused on a specific change - I'd perhaps suggest that you only adapt the check, and my commit add the |
* Use the all-compatible mawk `-W version` option. junegunn#3313 (comment). * Do not remap the history key if no dependent commands is installed (perl, awk or mawk in this order). * Run the command and not a function consistently with junegunn#3462. The version check bash code relies on the following mawk source code, extracted from mawk 1.3.4 20230322. ``` version.c: 18- #include "init.h" 19- #include "patchlev.h" 20- 21: #define VERSION_STRING \ 22- "mawk %d.%d%s %s\n\ 23- Copyright 2008-2022,2023, Thomas E. Dickey\n\ 24- Copyright 1991-1996,2014, Michael D. Brennan\n\n" .... 30- void 31- print_version(FILE *fp) 32- { 33: fprintf(fp, VERSION_STRING, PATCH_BASE, PATCH_LEVEL, PATCH_STRING, DATE_STRING); 34- fflush(fp); 35- 36- #define SHOW_RANDOM "random-funcs:" patchlev.h: 13- /* 14- * $MawkId: patchlev.h,v 1.128 2023/03/23 00:23:57 tom Exp $ 15- */ 16: #define PATCH_BASE 1 17- #define PATCH_LEVEL 3 18- #define PATCH_STRING ".4" 19- #define DATE_STRING "20230322" ```
@calestyo thank you for confirming that the mawk script works on Debian stable (i.e. 12, bookworm) and package version 1.3.4.20200120-3.1, AFAYCS.
Hmm, no, not really. We want I opened #3463 with my changes. It also addresses, in a slightly different way, the concern of #3462 for this piece of code. |
If someone wants really that deep control, shouldn't it be okay to assume that he can just override
Well sure, but the more we have the bigger the "pollution" gets. ;-) I mean in principle most of them should be simply in one |
Please note that associative arrays are not available on bash 3. |
sigh... |
* Use the all-compatible mawk `-W version` option. #3313 (comment). * Run the command and not a function consistently with #3462. The version check bash code relies on the following mawk source code, extracted from mawk 1.3.4 20230322. ``` version.c: 18- #include "init.h" 19- #include "patchlev.h" 20- 21: #define VERSION_STRING \ 22- "mawk %d.%d%s %s\n\ 23- Copyright 2008-2022,2023, Thomas E. Dickey\n\ 24- Copyright 1991-1996,2014, Michael D. Brennan\n\n" .... 30- void 31- print_version(FILE *fp) 32- { 33: fprintf(fp, VERSION_STRING, PATCH_BASE, PATCH_LEVEL, PATCH_STRING, DATE_STRING); 34- fflush(fp); 35- 36- #define SHOW_RANDOM "random-funcs:" patchlev.h: 13- /* 14- * $MawkId: patchlev.h,v 1.128 2023/03/23 00:23:57 tom Exp $ 15- */ 16: #define PATCH_BASE 1 17- #define PATCH_LEVEL 3 18- #define PATCH_STRING ".4" 19- #define DATE_STRING "20230322" ``` Co-authored-by: Junegunn Choi <[email protected]>
if the speed is an issue? why not writte logic in C, or go/rust/...? |
@glensc, TLDR; #3313 (comment) |
While awk is POSIX, perl isn't pre-installed on all *nix flavors. This commit replaces perl with awk to eliminate the dependency on perl.
Related: #3295, #3309, #3310.
Test suite passed:
make error
all test sections 'PASS'make docker-test
215 runs, 1884 assertions, 0 failures, 0 errors, 0 skips.Manually tested in the following environments:
Performance comparison:
Mawk turned out the fastest, then perl.
One true awk's implementation should be the closest to macOS awk.
Test data: 230 KB history, 15102 entries, including multi-line and duplicates.
Linux, bash 4.4. Times in milliseconds.
mawk 1.3.4
perl 5.26.1
one true awk 20221215
gawk 5.1.0
busybox awk 1.27.0
Other Notes
A bug affects bash, which fails restoring a saved multi-line history entry as a single entry. Bug fixed in version 5.0.1
While developing this PR I discovered two unsubmitted issues affecting the current perl script. The output stream ends with
$'\n\0000'
instead of$'\0000'
. Because of this, the script does not deduplicate a duplicated entry located at the end of the history list; therefore fzf displays two identical (not necessarily adjacent) entries. A minor point about the first issue is that the top fzf entry ends with a dangling line feed symbol, which is visible in the terminal.Footnotes
https://github.com/bminor/bash/blob/ec8113b9861375e4e17b3307372569d429dec814/CHANGES#L1511
To enable:
shopt -s cmdhist lithist; HISTTIMEFORMAT='%F %T '
. ↩