Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple-IO-redirection semantics ambiguous (how to swap stdout/stderr?) #733

Open
kwshi opened this issue Aug 1, 2018 · 12 comments
Open

Comments

@kwshi
Copy link
Contributor

kwshi commented Aug 1, 2018

In bash, multiple IO redirections are processed as though they were variable assignments. That is, one can swap stdout/stderr by doing

some-command 3>&2 2>&1 1>&3

which amounts to

          # fd3: /dev/fd3, fd2: /dev/fd2, fd1: /dev/fd1
fd3 = fd2 # fd3: /dev/fd2, fd2: /dev/fd2, fd1: /dev/fd1
fd2 = fd1 # fd3: /dev/fd2, fd2: /dev/fd1, fd1: /dev/fd1
fd1 = fd3 # fd3: /dev/fd2, fd2: /dev/fd1, fd1: /dev/fd2

In elvish, the same does not quite seem to work.

I have created test files write.py, which writes stdout to stdout and stderr to stderr:

# write.py
import sys
sys.stdout.write('stdout\n')
sys.stderr.write('stderr\n')

and filter.py, which processes and marks its stdin:

# filter.py
import sys
for line in sys.stdin:
  sys.stdout.write('|' + line)

By default, running python write.py | python filter.py works as expected, with only stdout being filtered and marked:

~> python write.py | python filter.py
stderr
| stdout

Now, if I try to swap stdout and stderr for write.py, I get errors:

~> python write.py 3&>2 2&>1 1&>3 | python filter.py
stdout
Exception: python exited with 1
[tty], line 1: python write.py 3>&2 2>&1 1>&3 | python filter.py
               _______________________________ (underline)

This works in bash as expected, however:

[kshi@dexi elvish-test]$ python write.py 3>&2 2>&1 1>&3 | python filter.py
stdout
|stderr
@zzamboni
Copy link
Contributor

zzamboni commented Aug 1, 2018

This can be achieved by using run-parallel and pipe, in a manner similar to pipesplit, seen in #500, to define a function which takes a lambda and swaps its outputs. For example:

[~]─> fn outswap [f]{
        pout = (pipe)
        perr = (pipe)
        run-parallel {
          $f > $pout 2> $perr
          pwclose $pout
          pwclose $perr
        } {
          cat < $pout >&2
          prclose $pout
        } {
          cat < $perr
          prclose $perr
        }
      }
[~]─> { echo stdout; echo stderr >&2 } | echo '|' (all)
stderr
| stdout
[~]─> outswap { echo stdout; echo stderr >&2 } | echo '|' (all)
stdout
| stderr

@xiaq
Copy link
Member

xiaq commented Aug 2, 2018

Thanks, @zzamboni. The issue is still a bug though.

@xiaq
Copy link
Member

xiaq commented Aug 7, 2018

The cause is that Elvish always closes the LHS of redirects:

fm.ports[dst].Close()

So Elvish actually closes fd 2 when it sees 2>&1, and closes fd 1 when it sees 1>&3.

We will need some sort of reference counting system to make this work correctly.

@zakukai
Copy link

zakukai commented Apr 8, 2020

Swapping file descriptors was never real easy to follow in Unix shells anyway, IMO - if you don't look at each redirection in the chain in terms of the dup() call the shell will perform it all seems backwards and confusing. Maybe it's better to think of other ways of expressing it?

For instance, combine redirection with braced lists:

> cmd {stdout,stderr}>&{stderr,stdout}
> # or...
> cmd {1,2}>&{2,1}

Or alternately, instead of treating redirections as a sequence of dup()'s performed in order, make the right-hand side file descriptors always refer to the file table as it was before the redirections were performed: so that it doesn't matter in what order a set of redirections is listed:

> cmd 1>&2 2>&1       # Swap stdout and stderr.
   # Note, 1>&2 doesn't change what file &1 refers to! So the following is equivalent:
> cmd 2>&1 1>&2       # Same thing!

The file descriptors on the RHS always refer to the file table that the command would have received if there hadn't been any redirections at all. The file descriptors on the LHS always refer to the file table that will exist in the new command's process.

This does potentially create problems when joining streams, however: A command like this to join the stdout and stderr of a command in a pipeline would work as long as we consider the pipe to already be part of the file table (and thus, &1 is the pipe) before we do redirections:

> cmd1 2>&1 | cmd2                  # &1 refers to the pipe here, so stderr goes to cmd2 rather than the terminal

But joining stdout and stderr and redirecting to a file would be a problem:

> cmd1 >./logfile 2>&1     # stderr does not go out to the logfile
   # Note that &1 still refers to the file it would have refered to if cmd1 were run with no redirects:
   # so the redirect to ./logfile doesn't affect the meaning of &1

That might be where we'd need to look at braced lists or something again:

> cmd1 {1,2}>./logfile                   # redirect both stdout and stderr to the logfile
> cmd1 {stdout,stderr}>./logfile

Alternately, join streams inside a lambda, and redirect to the file outside the lambda:

> { cmd1 2>&1 } >./logfile   # Inside the lambda &1 refers to the logfile.

@xiaq
Copy link
Member

xiaq commented Apr 24, 2020

@zakukai I like the idea of using braced list to swap FDs.

The other part of your proposal - making RHS refer to the original FD table - sounds problematic, as you have pointed out yourself. I am particularly worried that this subtle difference will be very confusing for people familiar with POSIX shell.

@zakukai
Copy link

zakukai commented Apr 24, 2020

I think that is generally the nature of this endeavour - creating a shell that is in some ways very similar to a POSIX shell but in some ways very, very different. There's always a balance to be struck between the familiar and the novel - but a shell like "elvish" is already moon-speak to someone steeped in the POSIX way of doing things (IMO anyway) and if they weren't open to trying a different approach they wouldn't be using it. :) Some old design choices are just past due to be revisited.

Personally my take is that the POSIX shell way of handling this is just plain confusing anyway. For people who understand the POSIX redirections, or at least have memorized a few idiomatic use cases like joining (2>&1) and swapping (3>&1 1>&2 2>&3 3>&-), it's true, it will confuse them that it works differently. But I think my suggestion is at least easy to understand once people understand the concept.

In proposing this I kind of had to shoot holes in it to see where it would have to lead. I think most of the "problems" I identified were solved pretty neatly, and it's really just when redirecting to named files (rather than other, already-open FDs) that it becomes challenging:

The basic problem is that redirecting to a file carries side-effects, so redirecting to the same target twice (with one-to-one redirections) wouldn't work:

cmd1 >$file_object 2>$file_object   # This would join the streams...
cmd1 >file 2>file   # Doesn't join streams, opens the file twice. Streams will probably overwrite each other in the file

And I can't open the file and then join the streams using one-to-one redirects because under my proposal, redirections can't reference each other:

cmd1 2>&1 >file    # doesn't send stdout to file - redirects can't reference one another
cmd1 >file 2>&1    # equivalent to prior line, order of redirects doesn't matter
{cmd1 2>&1} >file  # solved using lambda to establish a boundary where inner redirects CAN reference outer ones
with f as (fopen -w file) {cmd1 >$f 2>$f}   # Alternate method using hypothetical Python-esque "with" and file objects
cmd1 {1,2}>file      # solved using N-to-one redirection as a way to express joining FDs

It's a problem but I think N-to-one redirects solve it pretty well.

@xiaq
Copy link
Member

xiaq commented Apr 28, 2020

I think that is generally the nature of this endeavour - creating a shell that is in some ways very similar to a POSIX shell but in some ways very, very different.

True. However, my rule is thumb is if a piece of Elvish code looks exactly like a piece of POSIX shell code, it should either do exactly the same thing, or something totally different, never something subtly different - the last case is the most confusing.

I am perplexed about cmd {1,2}> file. It reads nice ("redirecting both 1 and 2 to file"), but consider this: cmd {1,2}>&{2,1} has an obvious parallel to assignment (x1 x2 = $x2 $x1), and in fact it is assignment in the FD table. Furthermore, this works for an arbitrary number of LHS and RHS, as long as their numbers match. However, in cmd {1,2}>file, the number of LHS and RHS do not match. Instead, this must be handled as a special case, assigning all LHS to the same (single) RHS value. This is like writing x1 x2 = 1 and expect both x1 and x2 to get the value 1 (it doesn't).

So to summarize, I like N-to-N redirections, but not N-to-one redirections. I find cmd >file 2>&1 bearable.

Another possibility is to "reify" the FD table as a dynamically scoped variable (#993), which provides a alternative syntax for redirections and can be used for complex situations such as swapping FDs.

@zakukai
Copy link

zakukai commented Apr 29, 2020

Indeed, supporting N-to-1 would not fit in with some of the general idioms of Elvish, which generally allows only N-to-N, in assignments, for instance. I had thought of proposing some kind of alternate syntax for N-to-1 redirect to keep it distinct from N-to-N, but ultimately I didn't see the need. But I can understand why you would consider this inconsistent and prefer to avoid it.

Speaking of the dynamically-scoped FD table variable idea: it occurs to me that this could also be used to do away with the ampersand notation in redirects:

> cmd 2>$io:out 1>$io:err

It's more verbose perhaps but the concept appeals to me: The ampersand notation for file descriptors exists basically because the shell didn't have any other way to refer to file descriptors. But Elvish has file objects that can be stored in variables, copied, passed as arguments, etc.

So in principle if the FD table (or, rather, the standard IO streams - IMO that's all of "the FD table" that should be needed or provided) were made available as variables, and particularly if paired with a way to cleanly control the lifetime of the file object (so it doesn't have to wait for GC to close the file) it would simplify some of the scenarios like "give a command in the middle of a pipeline access to the pipeline's final stdout":

> out = $io:out { cmd1 | cmd2 -logfd=5 5>$out | cmd3 }
(here "io:out" is a file object referring to stdout. We have to bind it to another variable because if we just used "io:out" directly in the invocation of cmd2, "io:out" would refer to the pipe that feeds cmd3)

Not crazy about expressing redirections as variable assignments, at least in terms of syntax:
io:out=$io:err cmd # as proposed in #993)

But it at least fits neatly with the idea that open files should be managed as variables, not as FD numbers. I suppose one could also borrow this chestnut:

cmd {io:out}>$io:err

This style of redirect in Bash or Korn Shell would normally mean "find an available file table slot and bind the file there, and record the file descriptor number in the variable provided" - but in this case since the variable used in the redirect is "io:out", it would instead bind stderr to file descriptor 1 (stdout) when launching the command.

@hanche
Copy link
Contributor

hanche commented Apr 29, 2020

@xiaq (minor nit) When you write “rectify” above, did you mean “reify”?

@xiaq
Copy link
Member

xiaq commented Apr 29, 2020

@hanche ah yes, I meant "reify".

@zakukai

I really like the observation that reified FD table can be used to eliminate the ampersand forms, and if reified FD table does get implemented, I'll consider deprecating and eventually removing the ampersand form; it's one of the more obscure parts of POSIX syntax.

Regarding how to name reified FDs: I am more inclined to expose the full FD table as a list, instead of naming the 3 standard streams separately. On a Unix syscall level, the FDs 0, 1 and 2 are not actually special in any way; it is only a user-space convention that most processes have these 3 FDs "pre-opened", and libc internalizes that convention. The story is different on Windows, which does treat the standard streams in a special way, but that shouldn't stop Unix users from doing what the OS allows.

The FD table can support in, out and err as aliases for the indices 0, 1 and 2; if people prefer, they can write $fds[out] instead of $fds[1], etc..

I also have seen some scripts that make use of higher FDs. I haven't written a lot of such scripts myself, but I feel it's an under-utilized way of doing simple text-based IPC that is worth encouraging.

Finally regarding cmd {io:out}>$io:err: it reads nice, with a hint of a parallel to variable assignment, but the LHS of > can only be a valid FD reference (io:out in your scheme, fds[out] in my scheme), so it doesn't seem to be worthwhile to keep up an appearance of generality.

@zakukai
Copy link

zakukai commented Apr 30, 2020

Personally, generally when I have used higher-number FDs within a shell script, it has been essentially because the shell does not support other ways of managing open files. For instance, if I have a function that "returns a value" (that is, by writing the value to stdout so the caller can capture it) but the function still needs to communicate information to the user using actual stdout, I will dup "actual" stdout to another file.

When it comes to invoking another program, I agree that full control over the set of numbered FDs should be supported. (I wasn't clear on that point) It's within the script that I think higher-numbered FDs shouldn't be used - within the idioms of the language I think it makes a lot more sense, if one needs additional files open, to use file objects (managed by variable scope, passed as arguments, exchanged between different threads using "put", etc.) rather than numbered file descriptors provided by redirection syntax. At the very least it's a pattern that should be discouraged IMO.

@xiaq xiaq removed the A:Language label Oct 28, 2020
@krader1961
Copy link
Contributor

Is this really a bug? That is, warranting the "bug" label? As the subject line states the current, documented, behavior is ambiguous. It also makes some, potentially useful, semantics hard to write. This seems to me to fall into the enhancement issue set.

@xiaq xiaq moved this to Todo in All Elvish issues Feb 25, 2022
@xiaq xiaq moved this from ❓Triage to 🧭Recon in All Elvish issues Feb 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🧭Recon
Development

No branches or pull requests

6 participants