-
-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support one-to-many piping in the pipeline syntax #500
Comments
@xiaq First of all, I must say that this is a great feature. I have wished for parallel stdout/stderr pipelines for a long time, and it's amazing that Elvish already supports this. And I also learned about About the syntax - honestly I think the following should be clear enough:
Lamdas are already allowed in pipes, so disambiguation could be done easily, e.g. to my eyes, the following is clear:
However, I understand your point about the linearity of pipes. If you want to make it explicity, why not introduce a new builtin (
This could internally map directly into the structure you described with |
It just dawned on me that this can be implemented as a function, and it works:
E.g.:
|
@zzamboni Right, it is already implementable as a function :)
|
@xiaq the rule could be "maximum one |
By using lambdas, the user could choose to use indentation to clarify things, as in some of your suggestions above, but the meaning would be clear to Elvish from the nesting, not from the indentation. E.g.:
|
I think using lambdas to disambiguate the proposed multiple-pipeline syntax would not actually achieve the desired result. The expectation in a pipe is that it connects a command immediately to the left, to a command immediately to the right. So:
This really would have to connect B's stderr, not A's, to C. Working around that with lambdas doesn't get us far IMO:
Getting around this just with lambdas in the pipeline would require preventing the undesired output of B or C from reaching the other tool that's being connected to A's other output:
...And of course that's not great either: to get B's stderr as part of the whole pipeline's combined stderr, you would need to dup stderr:
I know we already have a solution in the form of a user-defined function ("pipesplit" shown above) - and it's actually a good solution, but I want to explore the possibilities a bit with regard to syntax: Consider: the real benefit of redirecting to a process substitution, instead of piping, is that it uses redirection syntax rather than pipeline syntax: Redirections do not chain, instead they stack on the last command in the pipeline, so it's relatively straightforward to attach multiple of them to a single command. So we could do something like this:
Personally I like that better, at least expressively speaking, than "split-pipe". It doesn't use /dev/fd (because elvish supports file objects that can be used in redirections), and it's easier to expand truly to "many" pipelines rather than just one or two. Of course, there are problems with the implementation of "pipe-command" above:
(Personally I think it would really be preferable to have the two ends of the pipe as separate objects. Relying on GC to clean up a pipe isn't an ideal situation, of course, since it's not immediate - but the two ends are usually handled separately, doesn't really make sense IMO to bind them together. So building on this idea, I think a good answer could be to add syntax that works like "pipe-command" above, except better-managed: The shell doesn't retain the read end of the pipe at all - avoiding the deadlock preventing the pipe from getting GC'ed and the one from A filling up the pipe buffer and hanging if "C" closes its input but doesn't terminate:
It still doesn't solve the problem of C not being part of the pipeline job, unfortunately. C will run in the background until it self-terminates. But when A terminates, the write end of the pipe to C should be closed (possibly after a GC?), so if C terminates on EOF of its input, it would terminate at that point. Alternately, if the feature were only allowed as part of a pipeline, "C" could be part of the pipeline job, and the shell would be in a better position to manage the lifetime of the pipe:
Not sure about the syntax (I at least like it better than "A > >(C)"...) but you get the idea: it's a pipe, but with a syntax that behaves like a redirect so that multiple of them would all apply to A rather than chaining. |
@zakukai I really like your syntax exploration and enjoyed reading it. Thanks for sharing your insights. IIUC, the I echo your sentiment that |
Yes, the But the purpose of it wasn't as a general-purpose replacement of process substitution, rather my observation was that the real contribution of process substitution in forming this pipeline was that it allowed us to use redirection syntax, rather than pipeline syntax, to form a pipeline. A hypothetical syntax like that could work as a general-purpose replacement for process substitution, if we added another piece to do the following:
A mechanism like that could also apply more generally to other file objects - except that /dev/fd is kind of a terrible mess of a feature (at least on Linux) and does not work generally for other file objects (particularly sockets, which on Linux simply do not work on /dev/fd - this "broke" /dev/stdout in recent versions of Korn Shell, where pipelines were implemented with socket pairs). Given the issues surrounding /dev/fd personally I'm not too inclined to build features around it. So why go through all this to create an alternate syntax for a problem that can already be solved with process substitution? Basically I think that process substitution is really ugly syntax when it's combined with redirection to turn it back into a pipeline:
And, more generally, I want to explore how existing shell idioms could be transformed (for the better, one would hope) in a shell whose feature set may provide fundamentally better ways of doing some of these things. I like the general applicability of
Looking at it again I think |
Would it be possible to just detect when something like |
This issue is filed from #485, which asks for the functionality of piping both the stdout and the stderr of a command to different commands. That bug was closed because the functionality is now possible with the low-level
run-parallel
andpipe
builtins, but no new syntax were introduced.This issue discusses the possibility of extending the pipeline syntax to support such a pipeline configuration. Citing @mqudsi's comment, it is not easy to come with a unambiguous syntax for this:
A comment about this. I am not sure whether @mqudsi proposes that
./foo 1>| bar 2>| bar2
to mean "pipe stdout of foo to bar, and stderr of foo to bar2", but if that is the case, this is quite counter-intuitive. Traditional pipelines always work in a linear fashion, so it is tempting to interpret this as "pipe stdout of foo to bar, and stderr of bar to bar2".The syntax for the pipeline should prioritize linear pipelines and make non-linear pipelines more explicit.
Traditionally, this functionality is implemented with process substitution:
However, process substitution relies support for either
/dev/fd
filesystem or named FIFOs. This is backwards: named FIFOs or/dev/fd
is indeed needed if the process substitution needs to be used as command arguments, but when used in redirections, the same functionality is entirely implementable with plain, unnamed pipes.In fact, in Elvish it is already possible to do this, except that you have to manage the lifecycle of pipes manually:
Note that in the first function passed to
run-parallel
,foo > $pout 2> $perr
resembles the process substitution version. This is expected.Now for brainstorming a new syntax!
I think this is a bad idea, but a very intuitive syntax can look like this:
I have chosen to change
2>|
proposed by @mqudsi to2|
for terseness. Like2>
, there must not be any space between2
and|
.When you have longer pipelines you will need to align them up:
This syntax really takes whitespace-dependent syntax to the extreme. Again I don't think it's a good idea.
Another idea is supporting putting markers on commands in a pipeline, so that they can be referred to later on. Here I use
^name
both as marker and reference, but it's likely we will need separate syntax for them:The parser can work by looking beyond the pipeline on the first line, and as long as subsequent lines start with a marker, add that to the part of the pipeline.
The text was updated successfully, but these errors were encountered: