Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate user-provided env vars to subprocesses #65

Merged
merged 1 commit into from
Jul 17, 2023

Conversation

edmorley
Copy link
Member

@edmorley edmorley commented Jul 13, 2023

Amongst other things, this change adds support for:


Previously the buildpack used clear-env = true in buildpack.toml, which means the process env passed to the buildpack does not contain user-provided env vars (app config vars):
https://github.com/buildpacks/spec/blob/main/buildpack.md#provided-by-the-platform

When using clear-env = true the idea is that buildpack authors can then explicitly load/check the user-provided env vars when needed (eg when using env vars to control buildpack functionality, or when setting the env for subprocesses). This should then provide full control and hopefully prevent cases where incorrect/broken env vars on an app cause failures (for example from a malformed PYTHONHOME value).

At first glance this seems like a sensible thing to do, however:

  • clear-env = true doesn't actually clear the whole environment, it still allows through env vars set by the OS and env vars from previous buildpacks. This means that badly behaved previous buildpacks can still set broken/incorrect env vars, that we will need to handle regardless.
  • If an app really does have incorrect/broken user-provided env vars (app config vars), then setting clear-env = true only avoids problems for the current buildpack and not for later buildpacks or at runtime. ie: It only temporarily wallpapers over the problem, and means errors will only be shifted later (at a potentially harder to handle/debug time).
  • Quite often buildpack authors have to pass all env vars unfiltered to at least some subprocesses (eg when running pip install, for Support config vars in Pip requirements files #52). If there are incorrect or broken user-provided env vars, then those commands may fail unless they are handled some other way anyway. Plus, if a buildpack author passes user provided env vars to some subprocesses, but not to others, then it could make any errors seem more confusing to end users, since a user with a broken user provided PYTHONHOME would only see a proportion of Python invocations fail rather than all of them.
  • As of upstream RFC 109, there is now a new concept of "build config" env vars (those set in the build image at /cnb/build-config), which are also filtered out when using clear-env = true. As such, buildpack authors using clear-env = true have to add these back in explicitly too (along with user provided env vars), adding more overhead when using clear-env = true. In addition, at the moment libcnb.rs does not yet support them (Support accessing "build config" environment variables (/cnb/build-config; upstream RFC 109) libcnb.rs#597).

As such, IMO using clear-env = true at best adds overhead for minimal benefit (given other mitigations have to be put in place anyway), and at worst makes for a more confusing/harder to debug error experience (since errors can occur at runtime instead if not handled at build).

Therefore, use of clear-env = true has been removed, and instead the buildpack relies upon defensively setting known-dangerous/often-broken env vars explicitly - using the "override" (or where appropriate; "append") CNB env var feature, which means that the buildpack's env var takes precedence over any user or previous-buildpack provided env var. The buildpack already did this for all important env vars, so no other env var changes were required other than adding tests for some commonly seen broken env vars.

Env vars known to affect CPython can be found here:
https://docs.python.org/3.11/using/cmdline.html#environment-variables

And for Pip, env vars are based on their CLI option names, combined with a PIP_ prefix:
https://pip.pypa.io/en/stable/topics/configuration/#environment-variables

One set of known broken env vars seen on apps on Heroku, is these (which used to be set via bin/release many years ago, but can still be found on some very old apps):
https://github.com/heroku/heroku-buildpack-python/blob/27abdfe7d7ad104dabceb45641415251e965671c/bin/release#L11-L18

Fixes #52.
Fixes #64.
Fixes #66.
GUS-W-13648442.
GUS-W-13753676.
GUS-W-13753756.

@edmorley edmorley added enhancement New feature or request classic buildpack parity Features required for parity with the classic Heroku Python buildpack labels Jul 13, 2023
@edmorley edmorley self-assigned this Jul 13, 2023
@edmorley edmorley force-pushed the propagate-env-vars branch from 0b7497e to 981353f Compare July 13, 2023 14:45
@edmorley edmorley marked this pull request as ready for review July 13, 2023 14:54
@edmorley edmorley requested a review from a team as a code owner July 13, 2023 14:54
@edmorley edmorley force-pushed the propagate-env-vars branch 2 times, most recently from 266a394 to c701d06 Compare July 17, 2023 10:05
Amongst other things, this change adds support for:
- Environment variable interpolation in Pip requirements files (#52)
- Specifying an additional Pip package index URL using the env var
  `PIP_EXTRA_INDEX_URL` (#64)
- Passing arbitrary env vars to build backends such as setuptools (#66)

---

Previously the buildpack used `clear-env = true` in `buildpack.toml`,
which means the process env passed to the buildpack does not contain
user-provided env vars (app config vars):
https://github.com/buildpacks/spec/blob/main/buildpack.md#provided-by-the-platform

When using `clear-env = true` the idea is that buildpack authors can
then explicitly load/check the user-provided env vars when needed (eg
when using env vars to control buildpack functionality, or when setting
the env for subprocesses). This should then provide full control and
hopefully prevent cases where incorrect/broken env vars on an app cause
failures (for example from a malformed `PYTHONHOME` value).

At first glance this seems like a sensible thing to do, however:
- `clear-env = true` doesn't actually clear the whole environment, it
  still allows through env vars set by the OS and env vars from
  previous buildpacks. This means that badly behaved previous buildpacks
  can still set broken/incorrect env vars, that we will need to handle
  regardless.
- If an app really does have incorrect/broken user-provided env vars,
  then setting `clear-env = true` only avoids problems for the current
  buildpack and not for later buildpacks or at runtime. ie: It only
  temporarily wallpapers over the problem, and means errors will only be
  shifted later (at a potentially harder to handle/debug time).
- Quite often buildpack authors have to pass all env vars unfiltered to
  at least some subprocesses (eg when running `pip install`, in order to
  support the pip requirements file env var interpolation feature; #52).
  If there are incorrect or broken user-provided env vars, then those
  commands may fail unless they are handled some other way anyway. Plus,
  if a buildpack author passes user provided env vars to some
  subprocesses, but not to others, then it could make any errors seem
  more confusing to end users, since a user with a broken user provided
  `PYTHONHOME` would only see a proportion of Python invocations fail.
- As of upstream RFC 109, there is now a new concept of "build config"
  env vars (those set in the build image at `/cnb/build-config`), which
  are also filtered out when using `clear-env = true`. As such, buildpack
  authors using `clear-env = true` have to add these back in explicitly
  too (along with user provided env vars), adding more overhead when
  using `clear-env = true`. In addition, at the moment libcnb.rs
  does not yet support them (heroku/libcnb.rs#597).

As such, IMO using `clear-env = true` at best adds overhead for minimal
benefit (given other mitigations have to be put in place anyway), and at
worst makes for a more confusing/harder to debug error experience (since
errors can occur at runtime instead if not handled at build).

Therefore, use of `clear-env = true` has been removed, and instead the
buildpack relies upon defensively setting known-dangerous/often-broken
env vars explicitly - using the "override" CNB env var feature. The
buildpack mostly already did this, so few changes were required other
than adding tests for some commonly seen broken env vars.

Env vars known to affect CPython can be found here:
https://docs.python.org/3.11/using/cmdline.html#environment-variables

And for Pip, env vars are based on their CLI option names, combined
with a `PIP_` prefix:
https://pip.pypa.io/en/stable/topics/configuration/#environment-variables

One set of known broken env vars seen on apps on Heroku, is these (which
used to be set via `bin/release` many years ago, but can still be found
on some very old apps):
https://github.com/heroku/heroku-buildpack-python/blob/27abdfe7d7ad104dabceb45641415251e965671c/bin/release#L11-L18

Fixes #52.
Fixes #64.
Fixes #66.
GUS-W-13648442.
GUS-W-13753676.
GUS-W-13753756.
@edmorley edmorley force-pushed the propagate-env-vars branch from c701d06 to 10a4d08 Compare July 17, 2023 11:37
@edmorley edmorley enabled auto-merge (squash) July 17, 2023 11:38
@edmorley edmorley merged commit a8b9de6 into main Jul 17, 2023
@edmorley edmorley deleted the propagate-env-vars branch July 17, 2023 11:40
@edmorley edmorley mentioned this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classic buildpack parity Features required for parity with the classic Heroku Python buildpack enhancement New feature or request
Projects
None yet
2 participants