Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

typeset -S within functions #40

Open
marcastel opened this issue May 24, 2017 · 12 comments
Open

typeset -S within functions #40

marcastel opened this issue May 24, 2017 · 12 comments
Assignees

Comments

@marcastel
Copy link

Weird behaviour when using static function variables (typeset -S)

Consider the following script:

function alpha {
    integer -S count=0
    (( ++ count ))
    print -n " $count"
}

function beta {
    integer count=0
    (( ++ count ))
    print -n " $count"
}

print -n "Alpha:"; alpha; alpha; alpha; alpha; alpha; print
print -n "Beta: "; beta;  beta;  beta;  beta;  beta;  print

My understanding is that since the static variable is declared within a non-POSIX function its scope is that function's scope. Consequently the expected output should be:

Alpha: 1 2 3 4 5
Beta:  1 1 1 1 1

While the output of the alpha() function is consistent across tests. The behaviour of beta() is not consistent. I get various outputs, sometimes correct, but mostly incorrect. This has been tested with 93u+ on Linux and macOS.

Sample outputs:

Beta:  0 0 0 0 0
Beta:  0 0 0 1 0
Beta:  0 0 1 0 0
Beta:  1 1 0 1 1

I have not detected the pattern. Consequently I can reproduce the error (almost systematically), but not the exact output.

I tried the following without success:

  • Change the loading order of functions
  • Declare a variable with same name in the global scope (to eventually force a local scoping)
  • Replace the integer alias by its typeset equivalent
@jelmd
Copy link

jelmd commented Nov 2, 2017

At least on Solaris 11.3 SRU 24.4 it works as expected, i.e. embedded into a for (( I=0; I < 20; I++ )); do ... done it produces expected results, i.e.:

Alpha: 1 2 3 4 5
Beta:  1 1 1 1 1
Alpha: 6 7 8 9 10
Beta:  1 1 1 1 1
Alpha: 11 12 13 14 15
Beta:  1 1 1 1 1
Alpha: 16 17 18 19 20
Beta:  1 1 1 1 1
...

However, on Ubuntu 16.04 I see similar misbehavior as you:

Alpha: 1 2 3 4 5
Beta:  1 0 0 0 0
Alpha: 1 2 3 4 5
Beta:  0 0 1 1 0
Alpha: 1 2 3 4 5
Beta:  0 0 1 1 0
Alpha: 1 2 3 4 5
Beta:  0 0 1 1 0
...

Perhaps it is worth to check the patches applied to the Solaris version: https://github.com/oracle/solaris-userland/tree/master/components/ksh93/

@krader1961
Copy link
Contributor

It seems like a no brainer that we should merge fixes from other forks (e.g., Solaris). The problem is that doing that proactively may require more work than can be justified given that the changes may not apply cleanly. Furthermore, each change needs to be reviewed even if it does apply cleanly since the change may introduce problems. I think the value is worth the cost. So if you're willing to do some leg work, @jelmd, to figure out which of the Solaris patches fixes this problem we would be happy to merge them.

@jelmd
Copy link

jelmd commented Nov 5, 2017

It was just a hint, to possibly safe you some time. If you prefer troubleshooting it by your own and provide your own fix, just go ahead.

@krader1961
Copy link
Contributor

krader1961 commented Nov 7, 2017

I took at look at all of the Solaris patches. Some are only relevant to integrating ksh93 into the process of building Solaris software. Others, like 010-path_utmp.patch don't apply cleanly and it appears the fix has already been implemented in a different manner. Similarly this patch does not apply cleanly and appears to fix a wctomb() bug that may have been fixed in a different manner looking at the current code.

Then there's 015-solaris_alias.patch which introduces a src/cmd/alias/alias.c file. This patch appears to workaround a performance issue and a bug in command alias handling. The code is licensed by the CDDL. Even if we wanted to merge that patch could we given the license?

This one, 020-CR6919590.patch, looks like it can be applied but the affected block of code is now ~100 lines away from the location in the patch. And given the commit comment, "19782029 userland should be able to build from SCM repositories", it's hard to know exactly what bug is being fixed. And whether this block near the line number listed in the patch doesn't already address the issue:

                        if(sh_isoption(shp,SH_RESTRICTED) && !f && o==SH_RESTRICTED)
                                errormsg(SH_DICT,ERROR_exit(1), e_restricted, opt_info.arg);
                        break;

Patch 060-CR7065900.patch is a one line change that removes sleep builtin table. Why? The commit comment provides no clue. We probably don't want this one.

Patch 075-multi_lang_arith.patch is a one line change that looks valid and is not already in the AST code. It appears to fix a bug in S2F_function()'s handling of the thousands separator character (presumably only seen in locales that don't use the English rules). It should probably be merged but without a better understanding of what it fixes I'm hesitant to do so.

A lot of the more recent patches state "This fix is from the community, details in the following location." So presumably the fix is already in this code base.

A few say "This fix has been developed inhouse. Patch has been submitted upstream but has not been accepted yet." One such example is 250-22561374.patch which is tied to issue #7. It has not been applied and probably should be. None of the patches would obviously affect this issue but a couple in this category might.

@krader1961
Copy link
Contributor

@jelmd, Do you see the difference between your comment

It was just a hint, to possibly safe you some time. If you prefer troubleshooting it by your own and provide your own fix, just go ahead.

and my previous comment? I actually went to the trouble of trying to figure out which Solaris patches we might want to import. You just implied I'm an idiot for not wanting to import all the Solaris patches and instead do it the hard way by figuring out the fix for the issue on my own.

I'm not getting paid to work on keeping ksh relevant. Of the POSIX compatible shells it is simply my favorite such shell. If you want to help ensure ksh remains relevant you might consider actually contributing more than unhelpful comments.

@jelmd
Copy link

jelmd commented Nov 8, 2017

@krader1961: OT wrt. the Issue, but since you are asking I owe an answer:
I have a job and thus my time is limited. Even if you do not consider hints, where one may find a solution for a problem as a contribution, I do. And a lot of people are actually happy, when getting a hint, because this is all they need to get started. In contrast to this, wiping away any help, patches, ... (because of not understanding things, having not done its homework, having complete different ideas or for whatever reason) and than saying, that one is not willingly to contribute is simply not fair, to be diplomatic. So please be careful about what you are saying (I have a google translator ;-))! Just reading carefully and the will to understand, what ppl/non-native speaker have written may also help.

I think right-now no-one is getting payed to work on ksh, but spends its spare time to it. And thus it is not surprising, that at least some of them start wondering, when one, who actually hasn't even took the time to read the documentation, starts to make essential changes to the software they care about. Last but not least I haven't seen any open source project (and not even internal ones), where developers encourages others to commit aka merge back unfinished features. Maybe I've missed something, but than I think one can at least expect, that it gets explained, why such unusual things are being allowed. If it is the way you learned it - ok, that's at least an explanation, but still do not expect, that other developers change their mind about it ...

I hope, we can now concentrate on the issues instead polluting issues with discussion belonging somewhere else, e.g. ML.

@krader1961
Copy link
Contributor

I've built a ksh with every applicable patch from the aforementioned Solaris set of patches. They do not resolve this issue. However, I get this consistent output rather than random output or what is asserted to be the correct output:

Alpha: 1 2 3 4 5
Beta:  0 0 0 0 0
Alpha: 1 2 3 4 5
Beta:  0 0 0 0 0

@siteshwar siteshwar added the bug label Sep 5, 2018
@siteshwar siteshwar self-assigned this Dec 18, 2018
@siteshwar
Copy link
Contributor

siteshwar commented Dec 19, 2018

This bug relies on stale memory. It does not reproduce if I try to reproduce it under valgrind. Also, it stops reproducing if I replace this malloc with calloc or comment out this free condition. In most of the cases this malloc returns same memory address, so in a new function scope stale memory is being used, possibly by cdt code. But I could not find a way to prove it yet. At least it proves why everyone who tried to reproduce it earlier saw different outputs.

@krader1961
Copy link
Contributor

In other words yet another "use after free" bug.

@siteshwar
Copy link
Contributor

The definition of scope function seems strange. It's passing struct Namval object to nv_open function here. It's possibly relying on nv_search automatically detecting whether the first parameter is type of struct Namval, this behavior was refactored by b79bc56, so this code needs to be changed too.

@siteshwar
Copy link
Contributor

Also, this patch fixes the bug but causes other tests to fail:

diff --git a/src/cmd/ksh93/sh/arith.c b/src/cmd/ksh93/sh/arith.c
index 6f8df111..92f9d14b 100644
--- a/src/cmd/ksh93/sh/arith.c
+++ b/src/cmd/ksh93/sh/arith.c
@@ -700,7 +700,7 @@ static_fn Sfdouble_t arith(const char **ptr, struct lval *lvalue, int type, Sfdo
                             np = nv_open(*ptr, shp->namref_root,
                                          NV_NOREF | NV_VARNAME | NV_NOSCOPE | NV_NOADD | dot);
                         if (!np) {
-                            np = nv_open(*ptr, root, NV_NOREF | NV_VARNAME | dot);
+                            np = nv_open(*ptr, root, NV_NOREF | NV_VARNAME | NV_NOADD | dot);
                         }
                         if (!np || Varsubscript) {
                             np = NULL;

Arithmetic expressions gets compiled before function gets executed and they add a new variable to scope if it does not exist already. This seems to be causing issues later with name lookup.

@siteshwar
Copy link
Contributor

Also, I should point that this reproducer also leaks variable count in global scope:

function alpha {
    integer -S count=0
    (( ++ count ))
    print -n " $count"
}

function beta {
    integer count=0
    (( ++ count ))
    print -n " $count"
}

print -n "Alpha:"; alpha; alpha; alpha; alpha; alpha; print
print -n "Beta: "; beta;  beta;  beta;  beta;  beta;  print

echo "count: " $count

prints:

Alpha: 1 2 3 4 5
Beta:  0 0 1 0 1
count:  3

This issue is caused by arithmetic expression and function looking up variable in different scopes. My patch from previous comment might be correct fix for the issue. But in some cases ksh relies on adding a new variable to global scope when an arithmetic expression gets compiled, so it will also require touching other parts of code that depend on it. Fixing this bug carries risk of regressions, so I will consider fixing it in future.

citrus-it pushed a commit to citrus-it/ast that referenced this issue Apr 15, 2021
The code for handling process substitution with redirection was
never being run because IORAW is usually set when IOPROCSUB is
set. This commit fixes the problem by moving the required code
out of the !IORAW if statement. The following command now prints
'good' instead of writing 'ok' to a bizzare file:

$ ksh -c 'echo ok > >(sed s/ok/good/); wait'
good

This commit also fixes a bug that caused the process ID of the
asynchronous process to print when the shell was in interactive
mode. The following command no longer prints a process ID,
behaving like in Bash and zsh:

$ echo >(true)
/dev/fd/5

src/cmd/ksh93/sh/args.c:
 - Temporarily turn off the interactive state while in a process
   substitution to prevent the shell from printing the PID of
   the asynchronous process.

src/cmd/ksh93/sh/io.c:
 - Move the code for process substitution with redirection into
   a separate if statement.

src/cmd/ksh93/tests/io.sh:
 - Add two tests for both process substitution bugs fixed by this
   commit.

src/cmd/ksh93/tests/shtests:
 - Update shtests with a patch from Martijn Dekker to use
   pretty-printing for the output from the times builtin (if it
   is available).

Fixes illumos#2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants