Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the local builtin with support for dynamic local scopes #703

Open
wants to merge 59 commits into
base: dev
Choose a base branch
from

Conversation

JohnoKing
Copy link

@JohnoKing JohnoKing commented Jan 17, 2024

This is the eventual result of on/off work on the local builtin planned for inclusion in 1.1. (A declare builtin has also been added because it's a simple addtion when local is present.) Initially a port of ksh93v-'s code, this builtin was rewritten to implement local scoping in POSIX functions (i.e. name() { syntax). It's still experimental and may have bugs, but it's now stable enough to run the ksh and modernish regression test suites without failing.

Changes:

  • Implemented local and declare builtins, which both default to using dynamic local scoping regardless of the type of function. local cannot be used outside of functions, unlike declare which has no such requirement.

  • Added typeset -D, which allows typeset to use dynamic local scoping.

  • Added typeset -P/local -P, which allows for the use of static local scoping in both POSIX and KornShell functions (the flag used will likely be changed).

  • Completely overhauled POSIX function handling to run them from sh_funscope() instead of b_dot_cmd(). This made the implementation of local scoping much easier and also inadvertently fixed an out-of-bounds bug without adding a costly malloc call (cf. Array 'argv[0]' accessed at index -1, which is out of bounds att/ast#1423). (Note that KornShell functions (function name { syntax) that are executed via the . command will ignore scoping.) At present POSIX functions still use sh.dot_depth for measuring function depth.

  • b_dot_cmd(): Removed DOTMAX, as it's just a duplicate of MAXDEPTH.

  • b_dot_cmd(): Removed superfluous double setting of sh.st.lineno to 1.

  • Removed the (surprisingly useless) sh.posix_fun pointer and implemented a more versatile sh.infunction variable for identifying the current running function type, alongside helper FUN_POSIX, FUN_KSH and FUN_KSHDOT macros.

  • For scoping purposes, the NV_DYNAMIC and NV_STATSCOPE bitflags have been added (there's also NV_SCOPES, which is a macro for (NV_GLOBAL|NV_DYNAMIC|NV_STATSCOPE). np->nvflag was expanded to uint32_t with the help of the new NV_OPENMASK to allow it to store NV_DYNAMIC.

  • Added some (incomplete) scoping documentation to typeset --man.

  • Various updates to the regression tests, including the addition of a large set of scoping regression tests at ./src/cmd/ksh93/tests/local.sh. (The copyright date does accurately represent the age of that script. This pull request has been stuck in development hell for a LONG time.)

TODO List:

This implementation of local doesn't have proper scoping.
The POSIX function scoping mechanism should be implemented
in b_dot_cmd, although it should be done without breaking
backward compatibility (i.e., typeset's behavior in POSIX
functions shouldn't change as a result of implementing
scoping).

Discussion: ksh93#123

src/cmd/ksh93/sh/name.c:
- Set a dynamic scope in ksh functions for the local and
  declare builtins. This currently doesn't work in POSIX
  functions since b_dot_cmd doesn't apply any scoping
  right now.

src/cmd/ksh93/bltins/typeset.c,
src/cmd/ksh93/sh/xec.c:
- Only allow the local builtin to run inside of functions. This
  is accomplished by using a new 'SH_INFUNCTION' state set when
  ksh is inside of a function.
- Set shp->st.var_local to shp->var_tree for declare and local
  to get a dynamic scope in ksh functions. For typeset use
  shp->var_base to get a static scope in ksh functions.
  This will need bug-testing for when local is used in
  namespaces.
- Add local and declare to top comment alongside other builtins.
- Add a dictionary generator entry for declare(1).

src/cmd/ksh93/data/builtins.c,
src/cmd/ksh93/include/builtins.h:
- Add local and declare to the builtin table as declaration
  builtins. Depending on how local is specified in the next
  POSIX standard, it may need to be changed to a special builtin.
  That does break backward compatibility with existing ksh
  scripts though (such as one regression test script), so for now
  it's just a regular builtin.
- Update the man pages with references to local(1). Add
  information to the man page about the intended scoping
  of declare and local, although it's not completely
  implemented in POSIX functions.

src/cmd/ksh93/tests/builtins.sh:
- Run one of the regression tests for builtins inside of a
  function to fix a test failure with the local builtin.

src/cmd/ksh93/tests/local.sh:
- Add a number of regression tests mostly targeted at the
  local builtin. Most of the bugs tested for here are present
  in the ksh93v- and ksh2020 implementations of local and
  declare.
POSIX functions should default to placing variables on the global scope,
so this should be perfectly fine.
I suspect that the only reason dynamic scoping even somewhat works is
not because an actual scope was opened, but rather because the child
functions use the parent function's scope outright. This is no good and
will need a rewrite.
See the lengthy comment in xec.c for a detailed explanation.
Bear in mind that there are still three test failures in the modernish
test suite, which are probably cause by switching to sh_funscope() for
POSIX function handling:
  001: push;set;check;send sig;unset;pop;check  - FAIL: check traps, test var=$(trap)
  007: trap stack in a subshell                 - FAIL: wrong output from 'trap' (3)
  008: 'trap' can ignore sig if no stack traps  - FAIL: not ignored while no stack traps
These will need to be fixed at some point.
List of changes:
- A -c flag has been added to typeset that allows you to use statically
  scoped variables in POSIX functions.
- A noteable caveat regarding typeset -p's interactions with the new
  scoping functionality has been documented.
- The regression tests have been overhauled to account for the special
  handling of 'command typeset' in sh_exec(). More were also added for
  the new 'typeset -c' feature.
- The two nv_scan calls have been rolled into one faster call, which
  should have equivalent functionality (cf. the code for the
  scanfilter() function).
- Increment copyright dates using the update_copyright script from the
  wiki.
@dannyweldon
Copy link

local cannot be used inside of functions

Did you actually mean "local cannot be used outside of functions"?

@JohnoKing
Copy link
Author

Whoops, that was a typo. It has been fixed.

@McDutchie
Copy link

Thank you very much for all your work on this! I will have to take some time to fully understand the code.

Unfortunately, I already found a bug in dynamic scoping. Test script:

fun1() { local foo=set_by_fun1; fun2; echo $foo; }
fun2() { foo=changed_by_fun2; }
fun1

Expected output: changed_by_fun2; actual output: set_by_fun1.

@JohnoKing
Copy link
Author

That bug is caused by a limitation in the current scoping code, which interacts rather poorly with NV_NOSCOPE (the current code just copies the variable from the parent function and otherwise lacks a viewport to that scope). I'm considering reimplementing the ksh93v- sh.st.var_local pointer as a possible fix for that, but so far all of my attempts in that direction break static scoping in one way or another.

- local.sh: Added tests for the presence of the -D flag in output.
- sh.1: Removed this comment as this behavior is subject to change.
- xec.c: Update comment (the ksh93v- code is something I'm trying to
  salvage, but it certainly does seem unsalvageable.)
McDutchie added a commit to modernish/modernish that referenced this pull request Jan 19, 2024
McDutchie added a commit to modernish/modernish that referenced this pull request Jan 19, 2024
@ormaaj
Copy link

ormaaj commented Jan 21, 2024

Dynamic local?! I must be dreaming. :D

edit: aah and even works in posix functions! nice.

On a side note, I haven't had the chance develop local(1) further
because of a lack of time. In any case the current scoping bug is
evading a fix. The current approach needs significant changes,
as it has no backward compatibility when POSIX functions
are nested in ksh functions (the normal ksh93 behavior is to have
POSIX functions inherit the ksh function's full scope and use it
directly, which necessarily includes the ksh function's statically
scoped local variables. The current approach doesn't allow for this
as POSIX functions have their own scope for the local builtin. Backward
compatibility is currently achieved by having typeset modify the global
scope directly, but that's not really the correct thing to do. I have
tried altering typeset to modify the parent ksh function's scope
directly, but such changes cannot be seen in the POSIX function itself
after the variable in the parent function was changed, as the change
only happens in the parent scope and not the current local scope (which
defeats the whole purpose). To attempt to fix that, I then tried linking
the parent scope to the local scope, but so far that just breaks the
local scope (the cdt API lends itself very poorly to dynamic scoping).
This commit refactors the local builtin to take advantage of some
changes I intend to submit later as separate pull requests. These are:

- np->nvflags has been expanded from 16-bit to 32-bit. This fixes
  the longstanding limitation detailed in a previous ksh2020 bug:
  att#1038.
  The fix consists of masking out nv_open-centric flags with
  explicit uses of the new NV_OPENMASK, rather than implicitly and
  obtusely abusing the limitations of unsigned short types.
  Here, I take advantage of it for storing NV_DYNAMIC in np->nvflags
  and using it from there, which is a safer and more simple choice
  long term.

- A bugfix from ksh93v- 2012-08-24 has been backported to fix
  a potential segfault that is triggered under ASan when changing
  the sizes of np->nvsize and np->nvflags to uint32_t. This cannot
  be triggered on the dev branch, but I will submit this individually
  soon enough. (I would have already submitted it had I not been
  forced to deal with a broken PSU shortly after submitting ksh93#805;
  0 out of 10 experience don't recommend).

- Updated comment regarding pitfalls in the ksh93v- and current
  implementation of local.

In any case I intend to separate out various fixes from this branch and
submit them as separate pull requests, if only to eventually reduce the
scope of this one to just the local builtin and nothing else (the
sh_funscope refactor for instance fixes an out of bounds bug that has
been on the dev branch for far too long). Unfortunately my life has
been quite hellish lately, so there is no ETA yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants