-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serious performance breakdown of ksh2020 in comparison to ksh93u+ #1449
Comments
LOL! Yes, a lot of micro benchmarks are stupid in as much as they are irrelevant with respect to any real world program or shell script. Having said that, I am disturbed by the factor of three change in run time that I am also seeing on my primary platform.
I disagree, notwithstanding being a fan of ksh as the first UNIX shell I considered usable. The korn shell can not be a "serious programming" language due to the POSIX shell standard. Having said that I agree this slow down is not acceptable. |
Weird. I use it as a serious programming language all the time. |
yes, that is a very strange and very wrong, but no longer surprising, assessment by krader1961. it also makes me wonder where he sees any useful application for ksh: he does not like it for interactive use ("you should use fish"). he does not believe one should use it for substantial programming. so what? 10-50 lines shell scripts for stupid tasks? is that all, really? no. it is "just another useful scripting language". Korn himself at least viewed it on par with perl, Tcl, etc.. while I not fully agree, he sure was nearer to the truth than krader1961... @krader1961: at least I am happy to see that you agree to label the slowdown correctly as unacceptable. it totally is. see above: people do use ksh for programming. and speed matters a lot. always. do you have any idea what broke performance so massively? 10% is one thing. 300% is something else. |
to be on the save side I ran the test with Version ABIJM 93v- 2014-12-24, too. result: 93v- about 10% slower than 93u+. that I can understand: 93v- was in beta state and would have seen further optimization in due course, if development had not stopped then... so in any case the slowdown is not the fault of the code base you started with. |
@jghub Your script does not depend on any change introduced by ksh93v-, and should not be affected by any change introduced by that experimental code. So I don't understand why you think that 10% slow down is understandable. Especially since even a 10% slow down between adjacent minor releases is huge. |
300% is huge. 10% is 10%. 93v would have been a major release relative to 93u, not a minor one, in the view of D. Korn as far as I can tell. understandable/possible explanations:
and maybe we should talk about the remaining 10% only after you have found the other 290% that have crept in relative to 93v-? |
@jghub I suggest you read #42. You seem to think I'm actively trying to destroy I've done some carefully controlled tests using my Ubuntu 16.04 VM. The /bin/ksh on that distro runs the test script in 1.7 user seconds (not real-time since that is an unreliable metric on a VM). The ksh93v- I built took 2.5 user seconds. So already we're looking at a 47% increase before the current team has made any changes. The source at commit 02ca6c6 (which introduced support for Meson), and where very few changes had been made since ksh93v-, shows the same execution time. Bisecting the code shows there are two big bumps (i.e., > 1s). The first is commit bb57b0b that removed the hard-coded
No. The AST team did not include any performance regression tests in the test suite we inherited, AFAICT. Such tests should be added; however, doing so is non-trivial to avoid so many false negatives that the tests are pointless.
@jghub I welcome any issue you open, including this one, which documents an apparent bug, performance regression, or any other problem. However, your concerns would be better received if your comments didn't have so much anger. You seem to think I'm trying to destroy the usability of the Korn shell. I started using ksh88 somewhere around 1990. It was far better than sh, csh, and tcsh when its interactive behavior and scripting capabilities were considered as a whole. That is the sole reason I am contributing to this project. |
I have seen that thread, among others. all confirm that you received the same feedback from several people from the very beginning: have more respect for the code base, proceed with care. more like a neuro surgeon than like a warrior taking no prisoners. you never listened, let alone changed your attitude and approach. your reactions were showing a consistent inability or lack of interest to appreciate the details of what others said and an uncompromising desire to interfere heavily with the code base to make a "tidy up" no matter what the consequences and without understanding the code structurally and in its entirety. add to this an objectionable loudly voiced disregard for the capabilities of the creator of ksh and adolescent showing off in your bashing of his coding style. so, no I do not believe you are actively (in the sense of intentionally) trying to destroy ksh. I do believe you have actively (in the sense of being the causal factor) and severely damaged ksh already: and the results after two years of this activity are as they are and speak for themselves for everybody to judge by himself: run ksh93u+ or ksh93v- against ksh2020 and compare: features, user-facing/visible buggy behaviour in interactive and scripting use, performance. ksh2020 loses that comparison on all fronts AFAICS (even taking into account that, indeed, you fixed some bugs in TAB completion and such). this is a shame and a pity. but all of the above reiterates only what you have been told so many times before. so this really makes no sense. I still hope for a chance to see in this project or elsewhere a "reset" to make a carefully curated bug-fix release of ksh93u+ (or the original ksh93v-): that would be a much better shell than ksh2020 in my view.
I can not confirm this on OSX (not the least important unix-like platform out there...). here are the results for a couple more micro benchmarks ` ` 93u+ and 93v- are practically equally fast. no way is 93v- 47% slower on OSX than 93u+ (I don't see a good reason why it should be on ubuntu, but cannot test that right now). if at all Korn did make 93v- faster not slower than 93u+, mostly. "numfor.ksh" is the previously reported behaviour incrementing a counter in a numeric for loop: ksh2020 slower by a factor or 3. as already reported. "numloop.ksh" does the same but with a while loop. observation: 93u and 93v do both loops nearly equally fast. ksh2020 breaks that and makes, paradoxically, the numeric for loop slower by a factor of two. so: ksh2020 slower by a factor of 2 in this case in comparison to 93u and v. "braces.ksh" reads function f { ` and tests efficiency of doing command substitution via ksh93's important comparison with "numloop.ksh" (which is the same thing w/o the command substitution) was slower by a factor of two. and doing the `${ command; }' substitution slows it down by a further factor 1.5 nearly: ksh2020 here is overall slower by nearly a factor of three again. the factor of two worse performance relative to 93v- is there in some other benchmarks as well as you can see. it is quite consistent. so for my platform I challenge your claim that 93v- was already about 50% slower than 93u+ and I maintain that ksh2020 is "guilty" of the complete slow down, not only part of it.
then something happened already before that point it seems.
see above: it seems your starting point is already somehow flawed. but if you can locate later points where damage was done and revert the changes or fix it otherwise it would be a step forward. but not the complete solution to the problem.
as with all testing, you test only what you test. but that makes it not produce false positives (that's what you mean, actually, not negatives, but anyway) and not pointless. basically your reaction here could be added to all the other instances. you tend to try to bypass the real issue: in your massive overhaul of the code base it should have been mandatory to run whatever performance benchmark you like to verify continuously that you don't slow it down by your changes. (I still hope it is "only" compiler optimisation flags and not your replacing the memory stuff for the sole reason to make it "standard"...). that you did not notice at all what was happening to performance is alarming.
"apparent bug". not "obvious"? "anger": I don't feel any of that I can assure you. concern: yes -- I just would have wished for a responsible more careful/cautious and more considerate curator of ksh93. if this were only your private playground, nobody would care (me included). but the misleading claim that this project is the official future of "ksh as a viable shell for writing scripts in the 21st century." -- which is sort of "knighted" by this happening in the ATT/AST repo rather than in your fork of that repo -- makes it a serious problem in my view. these are my concerns as well as those of quite some people before me (most of them have long ago given up posting anything here for obviously arriving at the insight that it is pointless to argue with you about these things). you don't receive any of this serious and well-intentioned feedback (for how to maintain ksh, that is) well since you confuse the project with your person. I have got that.
see somewhere above: I already answered to that. I firmly believe you are misguided in your approach, not in your intentions. you clearly overestimate your competence (at the very least in comparison to the creator of ksh93) and this causes harm. my prediction is that you will not change your attitude and will proceed with the same self-righteous disregard for the code base and voiced concerns you exhibited from the start. I would be happy if you could proof me wrong here, of course, and all would be good in due time: ksh2020 really restoring the level of stability and performance ksh93u+ is known for. ksh2020 currently is not a step forward but a step back, unfortunately (to reiterate: despite my appreciation for actually having fixed a couple of minor bugs in TAB completion and output of `typeset -f' -- although breaking the latter completely in the release itself). I also propose to stop this discussion ("how should ksh93 really be maintained? has harm been done or not?") in this issue. let's go back to just addressing the reported bug: ksh2020 is slower by a factor of 2-3 in critical areas than ksh93u+ and ksh93v-. there must be reasons for this. it definitely is not the fault of 93v- vs. 93u+. so it is your fault. you should accept that. whether you fix it, is your decision, obviously. |
repeated benchmarks under ` ` /bin/ksh is ksh93u+ here as well. compared to what I see under OSX, the reported performance breakdowns are basically identical. a strange difference (this time in advantage of ksh2020) is seen take home message: in the cases reported previously (benchmarks numfor, numloop, braces) for OSX, the situation is basically the same for linux/ubuntu: ksh2020 slow down by a factor of 2-3 (but slightly less pronounced than under OSX (this of course could be due to apple doing their own minor optimizations to ksh93u+...). |
I realize my comment you were replying to wasn't as clear is it could have been about the main cause of the performance decrease of your micro-benchmark. But I thought it was reasonably clear that not hardcoding the The implicit change in build policy from production to debug was reasonable, and useful, during the past two years. But when the 2020.0.0 release occurred we didn't appreciate that the "debug" default build type would be an issue since there are no performance tests. So, as I stated above, we should change the default build type to "minsize" and expect environments where we need more debugging info to explicitly use that build type. |
just tested this. looks better but definitely not good enough: `name /bin/ksh ksh2020def ksh2020minsize braces.ksh 0m3.87s 0m9.17s 0m6.11s numfor.ksh 0m0.70s 0m2.18s 0m1.36s numloop.ksh 0m0.62s 0m1.24s 0m0.61s ` the ${ ...;} command substitution (first benchmark, using a while loop for iteration, so the numbers now -- since "while" back to normal -- are purely related to the command substitution call) still needs about 160% more time. an unacceptable "regression" in my view. (I reiterate that 93v- is not slower than ksh93u+: these are ksh2020 issues) the 2nd one (incrementing for loop) needs 200% more time. an unacceptable "regression" in my view. so the situation seems improved by changing compiler flags although this is only responsible for about "one half" of the observed performance breakdown. regarding "your micro benchmark": note that there are different ones. at least two different problems (numeric for and ${... ;} command substitution) have surfaced. I also now have benchmarked a real world ksh script/program (800 LOC) doing a mixture of data i/o, numerics and string processing: slow down by 140% with the "minsize" compiled variant, 175% with the default compile. that special program does neither use numeric for nor much ${ ...;} so the performance reduction is probably a rather generic problem. but please accept that even the micro benchmarks capture the essence of it: you are still (with minsize) facing a typical performance reduction of about 150%, possibly more. |
@jghub Where are these scripts published; i.e., available for download:
I can't figure out how the few code snippets you have included in comments relate to those script names or run times. I'd bet that some of the discrepancy in performance is due to a deliberate decision to not enable P.S., Enabling |
as explained, these are really very simple benchmarks, each testing essentially a single property:
save the above script to the directory where you want the benchmark files. save it as, say, "benchmarks". than do time them as "time ksh2020 numfor.ksh" etc. or write a small wrapper to run them in turn. adjust the loop counts to sensible values (the above are too low, probably) so that the run time is not too small (order of 1 sec for each benchmark, say). on linux you might need to additionally install "jot" and "rs" since they are not available by default (OSX is fine, though) and are used to generate input data in one benchmark. |
update: as of commit g5abcbd06 (current head) performance of ksh2020 remains down by a factor of 2-3 compared to ksh93u+ and ksh93v- in several critical features like incrementing numeric for and while loops (but other things like braced command substitution or string concatenation also see a slow down by a factor of 1.5-1.7). remarkably, this is now is true again for incrementing while loops as well which intermittently seemed to have profited from changed default compiler flags (buildtype "minsize"). in any case, this rather generalized substantial performance breakdown is not cured by the "minsize" flag (my understanding is, that it is now on by default, right?). so there is something severely broken, probably quite early in this project. why is this not bisected more thoroughly than what krader did a couple of weeks ago (and being done, it seems, with changing the buildtype...)? @siteshwar: when will this issue be addressed? it is paramount to maintain ksh93's performance. it is one of the most obvious and most relevant advantages of ksh over other shells. much easier to understand for potential new users than advantages of the additional features of KornShell language. performance should be restored fully before going on in my view. it will not become easier further down the road. |
To pipe in here, I do remember there being several changes to what looked to me like critical tight loops to simplify the code with I assume the expectation that the compiler would optimise it. Perhaps these could be investigated and tested with these benchmarks. And is any compiler optimisation actually happening currently? |
FYI information, I did some further benchmarking, also adding one for recursive function calls. result:
explanation:
the following can be concluded
apart from everything else that is going wrong in this project (and that's quite a lot, unfortunately): I do hope that at least the existence of an underperformance issue can now be acknowledged by everyone involved and hopefully be addressed by looking for the root cause(s) rather than declaring it "solved" by using |
I scaled the "braces" micro-benchmark by 50 so it has a meaningful run time on my system. I then ran it five times at the following commit points and picked the best (fastest) user mode CPU time from each run. This is the result:
Notice the big increase between HEAD~2400 and HEAD~2500. It turns out the increase is due to commit 489c672 to "Replace calls to AST Vmalloc with system malloc". See issue #396. We're certainly not going to reinstate AST Vmalloc. There is simply too much value in being able to use tools like Valgrind and ASAN which is only possible if we're using the system malloc. And in theory there shouldn't be any reason why using the system malloc is noticeably slower so this needs further investigation. For comparison, on the same system and using the same benchmark script the distro ksh93u+ binary clocks in at 4.668 CPU seconds and ksh93v- (which uses AST Vmalloc) at 5.031 CPU seconds. A 8% increase. P.S., I never "declared it "solved" by using --buildtype=minsize." I said that addressed most of the performance decrease. A debug build runs the benchmark in 11.458 CPU seconds. Which means a debug build accounts for ~4.5 seconds of the increased run time versus ~1.7 seconds due to switching to the system malloc. That is, a debug build causes a slow down 2.6x that of switching to the system malloc. |
A preliminary comparison of the execution profile of the two commits shows that |
The |
first of all: I welcome any initiative to finally start looking into the issue at a technical level (instead of dodging the issue and acting silly when faced with criticism). if, in the process, (at leat part of) the damage done to the code can be reverted, this would be progress. now to the details (in separate posts to avoid it becoming too unwieldy). |
see above...
your claim above:
fact: your statement I referred to when saying that was verbatim:
this is a paraphrase of "issue solved" for me (since it allegedly has "almost nothing" to do with the code). looking at the numbers for the whole set of benchmarks (see the table I provided #1449 (comment)) I think the conclusions I have drawn from the table in that post are correct. specifically 20-30% (and up to 85%) slow down of ksh2020ms vs. 93v- is not adequately described by "almost nothing". |
memory management: you say
this was my suspicion and that of knowledgable colleagues right from the start: "that has to have to do with changing the malloc stuff". you also say
that should not be decided by you alone, hopefully. if it does turn out that the malloc stuff is the cause of all (partly massive) slow down, this would need reassessment. there is no justification whatsoever to sacrifice the excellent performance achieved by Korn et al over 20 years or so of work at ksh93. of course I also would expect that other changes you have made to the code also hurt performance -- but this is a conjecture and it would be necessary to very very thoroughly look into each single benchmark, notably those which behave especially bad, and to restore performance to what is was before you started (at least: then there would still be the task to achieve a performance comparable to that of 93u+, which frequently, but not always, is noticeably faster than 93v-) |
since I do care very much for the performance of ksh (as, I am sure, many ksh users will), I wrote a few further benchmarks. a typical comparison (same conditions as here: #1449 (comment)) looks like this (ksh2020 w/o minsize, ksh2020ms w/ minsize, yesterday's tip of master)
everybody can do his own ratios but the "lowlights" are:
note that these are totally "random" benchmarks not selected/constructed to demonstrate ksh2020 deficiencies. there might be better benchmarks, so proposals would be welcome. the source code of most of these benchmarks was already listed here: #1449 (comment) the code for the newer ones follows:
how to use:
I do hope these additional benchmarks demonstrate that the performance reduction of ksh2020 vs. ksh93u+ can be seen in many places (and where it is not seen, it is the merit of ksh93v-, not ksh2020). my best guess is that what is seen here is a combination of the malloc replacement and "code tidy up" in places that better had been left alone. the especially strange behaviour of but |
Wow! That anyone thinks ksh is the appropriate tool for tasks such as computing the value of pi, other than in a "can this be done" way, blows my mind. I have to wonder how many languages that person has mastered. |
@jghub You wrote
Care to provide some proof of that statement? Had that been your "suspicion", and that of your "knowledgable colleagues", why didn't you mention it when you opened this issue? You wrote "consider a stupid micro benchmark" in your original problem statement and I'm inclined to agree. |
need I explain to you the meaning of the word "benchmark"? do you object against the benchmark computing something where the correct result is known a priori and can be asserted? what is your problem except that ksh2020 is overall much slower than ksh93u+ (although"only" 13% in this special floating point benchmark)? it is quite obvious that you don't want to admit that the slowdown is your fault (you haven't). you would prefer to deny existence of the problem (you have). you also might consider to deny it's relevance: actually, I am a bit disappointed that you have not yet done that ("ksh2020 is much slower than ksh93u+, but nobody wants a fast shell anyway since it is only a command interpreter like fish and ksh is not a serious scripting language etc etc etc pp").
care to adjust your tone? care to just read through this very thread? #1449 (comment). hint: search for the secret word "memory". also: care to explain to the world, why you managed to plow ahead for two years without ever caring to write your own small scripts for performance monitoring? and then making a "stable" release (build(ing) w/o the minsize flag) that was/is about a factor of 2-3 slower across the board than ksh93u+? care to give this engineering feat a grade from A to F? so far you maintained (do you, still?) that the problem has "almost nothing to do with [code] changes made in the past two years". that is obviously completely (and wittingly) untrue. all of the remaining issues (assuming minsize as a future default build option) we are now talking about are due to code changes. whether your changes to the memory management or your "code tidy up" and your "optimisations" takes most of the responsibility for the slow down or both are equally the culprit remains to be seen.
you do agree with what? just let the readers know. I (and several other people) have received a clear message from you again and again: you don't want real feedback. you don't want being pointed to real problems. you don't understand them. you deny them. you don't like being caught in the act of telling falsehoods to the readers here and on the mailing list. you don't correct your attitude. you don't listen. your reactions reliably are rude and arrogant. you are fighting a fight against the code (and its creators) and against the users of ksh who are noting that something wrong is happening due to your interference and are trying to make you realise that you need to correct your approach. you are short-sighted and rather clueless in all things ksh. that will not change. I know that by now. really. so please understand that my posts here are just documenting the problems of ksh2020 for the future. whether you acknowledge or deny them or ignore them is of secondary concern. I posted the benchmark tables and the code for people to copy+paste and see for themselves. somewhere down the road this might help to repair the damage you have done. |
Just a little note about VMALLOC (performance and otherwise): I would just like to point out that VMALLOC was not removed for no reason at all. Now about any possible performance loss due to replacing VMALLOC w/ regular system MALLOC: The writer of VMALLOC is now at Google (still is as far as I know), but I have never heard of him fixing up VMALLOC to be properly thread-safe (from whatever deficiency it currently suffers from in that regard). So I would prefer that some real system MALLOC (of whatever variety) continue to used as the default for KSH, as long as it is thread-safe (which they normally all will totally be). If somehow VMALLOC does become the default again, then I will take it upon myself to replace it (again) with a regular -- correctly operating and thread-safe -- system MALLOC (to other MALLOC implementation known to work). But before using VMALLOC, I would suggest exploring the use of a super fast regular MALLOC replacement -- that is out there - first. VMALLOC may have had other bugs which we did not know about also (who knows). |
@DavidMorano but as you know, very (very!) few people will ever consider to write their own builtins. so for the overwhelming majority it is simply more relevant how fast their ksh scripts execute. the default built of ksh in my view should serve this majority. possibly, built options can be added to select the malloc system to use? I don't know much of these things... the whole discussion here only revolves about a performance comparison of what /bin/ksh gives me and what ksh2020 provides. it is a description of the state of affairs. nothing more. it is not even a proven fact, that memory management is the main cause (it might be other code changes by krader). conclusions to be drawn might differ, depending on perspective. I for one firmly believe that performance (once achieved...) should not be given up lightly. for my purposes I am much better served by ksh93u+, therefore. as for "VMALLOC may have had other bugs": sure, but I very rarely, if ever, encounter segfaults (or other misbehaviour that would hint at VMALLOC bugs) of ksh93u+ (and if so, they are of the stack overflow variety so far...). so it seems more of a theoretical than a concrete problem, no? |
Yes, I agree with you above. The vast majority will not be using built-ins that are multi-threaded, at least not yet. There is -- at least at this stage of the world's usage -- little point in optimizing for that. Although most people will not write their own multi-threaded builtins, at some point pre-written multi-threaded builtins might replace some of the existing common builtins. But I still object to using VMALLOC since it demonstrated one headache already (buggy non-thread-safety). I would still advocate for some other MALLOC -- possibly optimized for speed over space efficiency. Yes, perhaps build options can, at some point, be provided to select possible MALLOC implementations (although a builder should be able to easily hack this themselves also as they desire).
Yes, I have followed the discussion and the relatively dramatic loss in performance is very interesting.
Yes, you are correct. And in all fairness, although I am suspicious of VMALLOC (because it is something that has not had wide usage in the world except for being in KSH itself), I cannot claim that I know of any other bugs in it, other than it had some sort of thread-safety bug. Yes, I freely admit my bias against VMALLOC, because it caused some hard headaches for me to track down (debug) that it was causing me problems. :-) |
Removing wip/ast-ksh: - no further development is planned, due to internal collision between ksh developers [1] [2] - it contains unresolved critical bugs [1] [3] - it derives from ksh93v- branch, at some point developed by Red Hat, which is not granted to be backward compatible with the last stable ksh93u+ version from AT&T, as ksh93u+m (wip/ksh93) is. - some features providing by this branch have been already backported to the verison provided by wip/ksh93 - the present package looks stale. - it requires CMake and meson, unlike other ksh93 branches which use AST's own build framework and test suite. [1] att/ast#1449 [2] att/ast#1466 [3] https://groups.google.com/g/korn-shell/c/2VK8kS0_VhA
Description of problem:
consider a stupid micro benchmark like
`
function repeatloop {
nameref count=$1
integer i=0 buf=0
for ((i=0; i< $count; i++)); do
((buf++))
:
done
print $buf
}
count=1000000
time repeatloop count
`
Ksh version:
on my machine (OSX 10.4.6, powerbook)
93u+: 1.01s
2020 release: 2.12s
head (42a580c): 2.98s
How reproducible:
always
Steps to reproduce:
run above script
Actual results:
100-200% slow down (i.e. by a factor 2-3!) of ksh2020 in comparison to 93u+. although ksh2020 thus remains faster than bash (11s) the "order of magnitude faster than bash" is no more . if this is not fixed one might consider to copy this line from the bash manpage: "It's too big and too slow."
I find a performance breakdown by such a massive factor disconcerting. it is a big step back from ksh93u+.
Expected results:
competitive performance
Additional info:
do people see this behaviour on other platforms, too?
KornShell language is not only a command interpreter for interactive use. it also is a serious programming language. performance matters a lot. ksh always had by far the best performance of the major shells. current development in this ksh2020 project seems to severely damage ksh in this respect. I hope this can be fixed. are there performance benchmarks in the test suite at all?
The text was updated successfully, but these errors were encountered: