Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RFC for process environment blocks #384

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jborean93
Copy link

Adds the RFC to support specifying environment variable(s) that are only set on subprocesses. This is to support a feature that is currently preset on sh based shells like bash but is not possible in PowerShell without more verbose code or other limitations.

Adds the RFC to support specifying environment variable(s) that are only
set on subprocesses. This is to support a feature that is currently
preset on sh based shells like bash but is not possible in PowerShell
without more verbose code or other limitations.
@iSazonov
Copy link
Contributor

When I contributed to this area, my plan was to first add new functionality like the Start-Process -AddEnvironment $hash, which is currently missing. It would be really useful, and only then to discuss the syntactic sugar for the command line as in this RFC. I think this is the right to implement first the AddEnvironment parameter.
After that, it would be a big question for me whether this sugar is necessary, because such an syntactic extension is primarily needed for convenience, which is to type fewer characters, but all we could offer is too verbose (not shorter than Start-Process -AddEnvironment $hash).

@ninmonkey
Copy link

ninmonkey commented Jan 28, 2025

We need to support both empty values and case sensitive key conflicts

I think this only applies to key names with 1 or more instances of U+0000

Here's some cases where a bad actor could replace a key, or remove one -- depending on the keys order.

  • If the key has a suffix of \x00+, it will trim into just key
  • If the key has a \x00 in the middle, it splits the key
Key Name Env Var Used
PATH␀␀ PATH
PATH␀SomeLongName PATH
Name␄ Name␄

For example

$vars = @{ 
   "Path"              = "/foo/bar"
   "Path`u{0}`u{0}"    = ''
   "Some`u{0}LongName" = 'Trunc'
}
Remove-Item Env:Some*
[Environment]::SetEnvironmentVariable( "SomePath", '/path' )
[Environment]::SetEnvironmentVariable( "SomePath`u{0}`u{0}", 'new' )
[Environment]::SetEnvironmentVariable( "Some`u{0}LongName", 'Trunc' )

Gci "env:\Some*"  |Ft -AutoSize

Name     Value
----     -----
SomePath new
Some     Trunc

Other control chars like 0x4 and 0xa don't collide

Remove-Item Env:Some*
[Environment]::SetEnvironmentVariable( "SomePath", '/path' )
[Environment]::SetEnvironmentVariable( "SomePath`u{4}", 'new' )

Gci "env:\Some*"  |Ft -AutoSize

Name      Value
----      -----
SomePath  /path
SomePath␄ new

If the edge case is specifically U+0000 values, You could mitigate that with a -replace '\x00', ''

from: https://gist.github.com/ninmonkey/c6bb4dc838e55e4581f86f86e0989f49

Update: It's a little more than trimming

Note that it maps

SomePath`u{0}`u{0} 
    to SomePath

but 
Some`u{0}Path`u{0}`u{0} 
    to Some
remove-item env:\*some*
[Environment]::SetEnvironmentVariable( "SomePath", 'one' )
[Environment]::SetEnvironmentVariable( "SomePath`u{0}`u{0}", 'two' )
[Environment]::SetEnvironmentVariable( "Some`u{0}Path`u{0}`u{0}", 'three' )

Gci "env:\*Some*"|Ft -AutoSize

Name     Value
----     -----
SomePath two
Some     three

Environment:

os: Windows
Pwsh: 7.4.6

@jborean93
Copy link
Author

@iSazonov

When I contributed to this area, my plan was to first add new functionality like the Start-Process -AddEnvironment $hash, which is currently missing.

While I can't comment on whether a new parameter makes sense or not this to me is less pressing as you can already achieve that with

$envBlock = [Environment]::GetEnvironmentVariables()
$envBlock['NEW'] = 'foo'

Start-Process ... -Environment $envBlock

After that, it would be a big question for me whether this sugar is necessary, because such an syntactic extension is primarily needed for convenience, which is to type fewer characters, but all we could offer is too verbose (not shorter than Start-Process -AddEnvironment $hash).

See the RFC as to why this is useful for direct invocations. It does make it cleaner to type so you don't have to reset the values back but the technical issue this RFC solves is the ability to do this in parallel invocations without stepping on each others toes. Take this for example

1..5 | ForEach-Object -Parallel {
    $env:FOO = $_
    bash -c 'echo $FOO'
    $env:FOO = $null
}

Your results will differ from mine but sometimes this works but the majority of the time this fails because FOO will be clobbered by each parallel invocation. For example I get this on my first run with various other permutations

1
2
4
4

Whereas focusing on Start-Process doesn't make too much sense because:

  • There is a solution already to achieve appending of env vars that can be used today
  • You cannot easily capture output with Start-Process
  • Start-Process on non-Windows is difficult to deal with and avoid clobbering your existing terminal window

There are many things that should be improved by Start-Process but I think the native command processor is in dire need of some more improvements like this, the ability to set a var for encoding, and others. Whether that's improved by things like this RFC or rejected in favour of some other solution I'm not sure but is why I've opened this RFC for this specific feature.

@jborean93
Copy link
Author

@ninmonkey

I think this only applies to key names with 1 or more instances of U+0000

Null chars are certainly a problem because env vars are simply a single string in some memory block of the process with each entry separated by [char]0. By embedding your own null chars you are effectively splitting the values manually. It is certainly worth considering but as you've shown this is already something that can be done manually so this new syntax isn't adding any new attack vectors.

The problem I am mentioning here specifically is if someone wanted to set an env var to an empty string or use an env var where the key was the same but in a different case. For example being able to set PAGER to an empty string might be needed

& @{ PAGER = '' } ...

A practical example is AWS_PAGER where an empty string disables the pager aws/aws-cli#8351. This is now technically possible to do in PowerShell manually with $env:AWS_PAGER = '' since .NET 9 enabled empty env var values but the issue is just a demonstration of when this feature might be used.

As for duplicate env var keys it is more if you wanted to see both FOO and Foo as an env var. On Windows this doesn't make much sense because Windows just looks up env vars case insensitively but on non-Windows this is technically possible, even if not advisable. In the proposed syntax you would have to build the hashtable using the constructor so it allows a case sensitive key lookup rather than the hashtable literal which is case insensitive.

$envBlock = [Hashtable]::new()
$envBlock.FOO = 'hello'
$envBlock.Foo = 'world'

& $envBlock ...

@iSazonov
Copy link
Contributor

Whereas focusing on Start-Process doesn't make too much sense because:

  • There is a solution already to achieve appending of env vars that can be used today

You've provided a workaround and shown the flaw yourself. Start-Process -AddEnvironment does not have this disadvantage. Everything else is not relevant to the discussion, as we are essentially only talking about "less typing" syntactic sugar.

@jborean93
Copy link
Author

The Start-Process cmdlet has a few problems that are not really solved without making even more of a mess. The key problems or at least things it does not do well are;

  • You cannot capture output or pipe input without dealing with external files
  • As the above you cannot take advantage of the pwsh minishell with CLIXML to object parsing
  • There is no argument escaping helper, -ArgumentList on Windows takes in a string so it is up to you to escape things
  • On non-Windows you cannot avoid inheriting the tty without hacks no no-hup
    • This makes it even harder to capture the output even with file redirection
  • You cannot pipe processes with each other

Most of these arguments effectively revolve around that fact you cannot integrate the process input/output into the PowerShell pipeline. I personally think any attempts to do so would just make it even more confusing as now we have 2 more parameter sets, having to complicate an already complicated cmdlet, and dealing with weird defaults where you need to opt into capturing the output rather than it just being done like the native command processor.

I certainly would advocate a new cmdlet like Invoke-Process that mirrors the functionality of the native command processor but with the ability to set more advanced parameters like output encoding, env vars, rc handling, etc. I've even created some in my own module ProcessEx. But that seems to have stalled many times so this is my attempt to expand on the existing functionality that exists in PowerShell. Telling people to use a module to perform what is arguably the basic functionality of a shell is a bit hard to swallow for end users.

as we are essentially only talking about "less typing" syntactic sugar.

For a shell, being able to spawn sub processes is a critical component. This is even more important on non-Windows hosts where there are many tools out there that exist as a binary and not part of .NET. Speaking from my experience on the PowerShell Discord, this may not be the most common scenario I've heard but it is certainly a question I've seen around once a month. I don't think dismissing a common pattern and problem in PowerShell as syntactic sugar is really indicative of the need here.

In any case this is an RFC so I do appreciate your comments, hopefully we can get more input from other people around the proposal.

@MatejKafka
Copy link

I'd really welcome this feature, although I'm not entirely sold on either proposed option:

  1. The hashtable option is quite foreign-looking – if someone new sees it for the first time, I don't think they'll be able to guess what it does, not be able to search for it (since there's no obvious keyword to search for in the docs). Nevertheless, I appreciate reusing an existing syntax (which fits quite well with existing PowerShell conventions), just don't think it looks "good" (but maybe it's just a question of getting used to it).
  2. The sh-like option (VAR=1 cmd) is compatible with bash, but I don't like the ambiguous parsing, and again, it's somewhat to hard to search for (but maybe that's not that much of a problem, since bash is using it, so it's somewhat well-known).

Personally, I'd prefer having an env keyword/command similarly to the posix /usr/bin/env binary, so that you can do env VAR=1 cmd, which gives you something to search for in the docs (pwsh env), removes parsing ambiguity and is still quite compact and familiar to POSIX shell users.

The second question is what to do with pwsh cmdlets (as opposed to native commands). In an ideal world, we'd probably change the env provider to support per-runspace env vars that are only written into the environment when invoking an external project, and support the per-invocation env for all commands, but that would be a lot of work (with a lot of compatibility breaking potential). This would match what we're doing for e.g. working directory (virtualized per-runspace).

Personally, even without the provider changes, I'd still be inclined to support this for non-native commands in a kinda hackish way by setting the environment variables process-wide and then reverting the changes afterwards, and add a big disclaimer to the docs that you really shouldn't use this when using multiple runspaces inside a single process (which is still quite rare, so the chance for breakage seems acceptably low).

@HeyItsGilbert
Copy link

For the record, that would help me easily solve PowerShell/PowerShell#20804. This would also be immensely helpful in testing and in executing code where we don't want to overwrite the environment variables for end users.

@jborean93
Copy link
Author

@MatejKafka, thanks for your comments.

The hashtable option is quite foreign-looking

I definitely agree with this statement and am not 100% happy with the proposed solution but I ended up using it in the RFC because I found it was quite flexible in terms of:

  • multiline support
  • being able to build it dynamically
  • the very minute risk of it breaking existing code
  • it doesn't require any Ast or parser changes

The only saving grace IMO is that this would only work if used with the call operator. It does give a starting point for people to look for documentation by search powershell about & which leads to the about_Operators page. Still certainly not ideal.

Personally, I'd prefer having an env keyword/command similarly to the posix /usr/bin/env binary,

This is a nice idea but it does bring up a few concerns

  • it would potentially break existing scripts that have an env command defined
  • how would you define the key/values, will it still need extra parsing if you did env foo=bar
  • if a cmdlet then now it would need to re-use the native command processor code which has different argument binding behaviour to standard cmdlets

I think having it be a statement that essentially acts like the call operator (&) would be nice but it does still have the problem where it would impact existing scripts that have defined an env command. We know that on non-Windows this already exists through the /usr/bin/env binary so users wouldn't be able to call that anyway.

Maybe the end result is Windows users are out of luck and non-Windows users continue to use the env binary to achieve this in PowerShell. I'm not a fan of this as @HeyItsGilbert has shown that it can be useful for Windows users as well.

The second question is what to do with pwsh cmdlets (as opposed to native commands)

Using env vars scoped to a runspace would be interesting but I think the fact they are essentially backed by the process wide env block makes it too hard to achieve. This unfortunately is one of those things that would have had to happen in a v1/v2 release, doing so now would probably be too large of a change.

Personally, even without the provider changes, I'd still be inclined to support this for non-native commands in a kinda hackish way by setting the environment variables process-wide and then reverting the changes afterward

I think even with the disclaimer people will either still continue to ignore it and get surprised that it doesn't work well in multiple runspaces. I don't think that is quite rare now with ForEach-Object -Parallel being a thing so I think it is still a concern. I think having it error for now would be better as we can punt the decision of what to do here for the future.

Let's see what the WG thinks of this proposal, I definitely like the env statement idea but think the risk of breaking existing code might just be a bit too much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants