Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add -z|--zero options to basename and dirname builtins and fix support for PATH_LEADING_SLASHES (plus small buffer overflow fix) #800

Merged
merged 10 commits into from
Dec 26, 2024

Conversation

dnewhall
Copy link

arith.c:

  • Fixed small buffer overflow: The check for null termination needs to happen after ensuring the name matches because the strncmp will return false if the name is shorter than fsize.

basename.c:

  • Added support for -z|--zero argument for compatiblity with GNU basename.
  • Renamed local helper function from namebase to l_basename
  • Made l_basename match the behavior in the usage string regarding PATH_LEADING_SLASHES (which it had not supported at all)
  • Added some comments

dirname.c:

  • Added support for -z|--zero argument for compatiblity with GNU basename.
  • Fixed bug in l_dirname where PATH_LEADING_SLASHES wasn't adhered to if argument only contained slashes
  • Added some comments

The check for null termination needs to happen after ensuring the name matches because the strncmp will return false if the name is shorter than fsize.

Also added some comments.
…t for PATH_LEADING_SLASHES

basename:
- Added support for -z|--zero argument for compatiblity with GNU basename.
- Renamed local helper function from namebase to l_basename
- Made l_basename match the behavior in the usage string regarding PATH_LEADING_SLASHES (which it had not supported at all)
- Added some comments

dirname:
- Added support for -z|--zero argument for compatiblity with GNU basename.
- Fixed bug in l_dirname where PATH_LEADING_SLASHES wasn't adhered to if argument only contained slashes
- Added some comments
@McDutchie
Copy link

Thanks for the PR! But I would like it to apply the minimum change necessary, without all the code style changes.

@McDutchie
Copy link

Also, after applying the PR, a bin/shtests -p run shows 131 regression test failures due to the changes in arith.c.

for(tp=shtab_math; *tp->fname; tp++)
char firstc = fname[0];
/* first byte of tp->fname is num. args and return type, unless empty */
for(tp=shtab_math; tp->fname[0]=='\0'; tp++)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is incorrect; it inverts the sense of the second for expression, so that no math function is ever found. That for statement should not change.

{
if(*tp->fname > c)
/* shtab_math is in alphabetic order - if first character is greater, we're done */
if(tp->fname[1] > firstc)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is incorrect, because it assumes that shtab_math is in alphabetical order. Unfortunately, a look at the generated shtab_math in arch/*/src/cmd/ksh93/FEATURE/math shows that it is not sorted, alphabetically or otherwise.

What on earth AT&T meant to accomplish with the original if(*tp->fname > c) is a mystery. In practice, it's a no-op. The first byte of fname is non-printable and never greater than octal 12, i.e. 10 (going by the generated shtab_math on my machine). It is compared against a printable ASCII character which will always be greater than that, so the break is never reached. Those two lines can be deleted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment provided for historical interest; feel free to ignore.

I was curious why the if(*tp->fname > c) break; nonsense referenced above was there, so I did a little research in the ast-open-archive repo.

In the earliest available version (1995), shtab_math was in fact hardcoded and sorted alphabetically, and the fname field was a simple name without a preceding byte with option bits.

By the next available version from 1999, shtab_math had transitioned to its current form, but that if was still there, now nonsensical, and by pure luck a no-op. So it looks like they simply forgot to delete it, and there it remained until now.

@McDutchie
Copy link

McDutchie commented Dec 25, 2024

So, the minimum necessary change to arith.c (while keeping your new comments), as a patch against the dev branch, is:

diff --git a/src/cmd/ksh93/sh/arith.c b/src/cmd/ksh93/sh/arith.c
index 788f9553b..60c410f99 100644
--- a/src/cmd/ksh93/sh/arith.c
+++ b/src/cmd/ksh93/sh/arith.c
@@ -198,15 +198,15 @@ static Namval_t *scope(Namval_t *np,struct lval *lvalue,int assign)
 	return np;
 }
 
+/* look up a function in the standard math function table */
 static Math_f sh_mathstdfun(const char *fname, size_t fsize, short * nargs)
 {
 	const struct mathtab *tp;
 	char c = fname[0];
+	/* first byte of tp->fname is num. args and return type, unless empty */
 	for(tp=shtab_math; *tp->fname; tp++)
 	{
-		if(*tp->fname > c)
-			break;
-		if(tp->fname[1]==c && tp->fname[fsize+1]==0 && strncmp(&tp->fname[1],fname,fsize)==0)
+		if(tp->fname[1]==c && strncmp(&tp->fname[1],fname,fsize)==0 && tp->fname[fsize+1]==0)
 		{
 			if(nargs)
 				*nargs = *tp->fname;

@McDutchie
Copy link

The basename.c changes are good, except for all the style changes. That bit of AT&T nonsense (checking both *last=='/' and *first=='/' when it is known that last==first) was well spotted.

@McDutchie McDutchie dismissed their stale review December 25, 2024 02:06

Changes applied

@McDutchie
Copy link

McDutchie commented Dec 25, 2024

Fixed bug in l_dirname where PATH_LEADING_SLASHES wasn't adhered to if argument only contained slashes

There are some problems with this:

$ arch/*/bin/ksh -c '/opt/ast/bin/dirname /' 
/		# OK
$ arch/*/bin/ksh -c '/opt/ast/bin/dirname //'
		# Wrong (empty)
$ arch/*/bin/ksh -c '/opt/ast/bin/dirname //usr'
		# Wrong (empty)
$ arch/*/bin/ksh -c '/opt/ast/bin/dirname //usr/bin'
/usr		# OK
$ _AST_FEATURES='PATH_LEADING_SLASHES - 1' arch/*/bin/ksh -c '/opt/ast/bin/dirname /'
/		# OK
$ _AST_FEATURES='PATH_LEADING_SLASHES - 1' arch/*/bin/ksh -c '/opt/ast/bin/dirname //'
/		# Wrong
$ _AST_FEATURES='PATH_LEADING_SLASHES - 1' arch/*/bin/ksh -c '/opt/ast/bin/dirname //usr'
/		# Wrong
$ _AST_FEATURES='PATH_LEADING_SLASHES - 1' arch/*/bin/ksh -c '/opt/ast/bin/dirname //usr/bin'
//usr		# OK

@McDutchie
Copy link

In the fail cases, last==pathname, so in the original version, the code to preserve initial // does not get executed. Your fix was to remove the last!=pathname check, but that turns out not to be correct. A better fix is to revert that, add some code that sets last to the last of the initial slashes if the string starts with multiple slashes. This allows the original code to do the right thing.

	/* preserve leading '//' */
	if(last==pathname && pathname[0]=='/')
		while(last[1]=='/')
			last++;
	if(last!=pathname && pathname[0]=='/' && pathname[1]=='/')
	{
		/* skip any '/' until last two */
		while(pathname[2]=='/' && pathname<last)
			pathname++;
		/* skip first '/' if PATH_LEADING_SLASHES not set */
		if(last!=pathname && pathname[0]=='/' && pathname[1]=='/' && *astconf("PATH_LEADING_SLASHES",NULL,NULL)!='1')
			pathname++;
	}

@McDutchie
Copy link

If compatibility with GNU is a goal, then dirname should also accept multiple operands like the GNU (and BSD) version does, otherwise the new -z option is not very useful.

@McDutchie McDutchie merged commit c692ee1 into ksh93:dev Dec 26, 2024
McDutchie added a commit that referenced this pull request Dec 28, 2024
src/cmd/ksh93/sh/arith.c: sh_mathstdfun():
- Fixed small buffer overflow: The check for null termination needs
  to happen after ensuring the name matches because the strncmp
  will return false if the name is shorter than fsize.
- Remove a condition for breaking the loop that is never true;
  *tp->fname is a nonprintable byte that is never greater than c,
  which stores a printable character. See shtab_math in
  arch/*/src/cmd/ksh93/FEATURE/math.

src/lib/libcmd/basename.c:
- Added support for -z|--zero argument for compatiblity with GNU
  basename.
- Implement the behavior described in the usage string regarding
  PATH_LEADING_SLASHES (which it had not supported at all).

src/lib/libcmd/dirname.c:
- Fix handling of initial // where 'getconf PATH_LEADING_SLASHES'
  is 1 (i.e. Cygwin).
- Added support for -z|--zero argument for compatiblity with GNU
  basename.
- Handle multiple operands.

Co-authored-by: Martijn Dekker <[email protected]>
@dnewhall
Copy link
Author

dnewhall commented Jan 6, 2025

Thanks for reviewing and fixing my stuff. Sorry for not responding during the holidays.

The -z option is the only option that is missing compared to GNU basename and dirname on my system. I felt it was a worthwhile addition due to the advice here, which I usually try to follow when I can: https://dwheeler.com/essays/filenames-in-shell.html#null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants