Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Missing pyvenv.cfg and non-standard layout of venv #382

Closed
njlr opened this issue Aug 17, 2024 · 9 comments
Closed

[Bug]: Missing pyvenv.cfg and non-standard layout of venv #382

njlr opened this issue Aug 17, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@njlr
Copy link

njlr commented Aug 17, 2024

What happened?

I have been comparing the structure of py_venv output to standard Python tools.

It appears that some files are different:

  • pyvenv.cfg is missing
  • The site packages are not in the standard location lib/pythonx.y.z/site-packages

I was unable to find a spec for virtual envs.
This blog post describes a venv structure that matches my venv output: https://snarky.ca/how-virtual-environments-work/

Is this working as expected?

Version

Development (host) and target OS/architectures: Ubuntu 24.04

Output of bazel --version:

bazel 6.2.0

Version of the Aspect rules, or other relevant rules from your
WORKSPACE or MODULE.bazel file: 0.7.4

Language(s) and/or frameworks involved: -

How to reproduce

No response

Any other information?

No response

@njlr njlr added the bug Something isn't working label Aug 17, 2024
@njlr njlr changed the title [Bug]: Missing pyvenv.cfg from venv [Bug]: Missing pyvenv.cfg and non-standard layout of venv Aug 17, 2024
@mattem
Copy link
Collaborator

mattem commented Aug 17, 2024

How are the comparisons being made? What bazel commands are being run?

@njlr
Copy link
Author

njlr commented Aug 17, 2024

Sure, so the standard approach (without Bazel) for creating a venv bundle is:

# pip3 install venv-pack
python3 -m venv_pack -f -o /venv.tar.gz

With Bazel, I am attempting this:

py_venv(
  name = "venv",
  deps = [
    ":lib",
  ],
)

pkg_tar(
  name = "venv_bundle",
  out = "venv.tar.gz",
  extension = "tar.gz",
  srcs = [
    ":venv",
  ],
  include_runfiles = True,
  strip_prefix = strip_prefix.from_pkg(),
)

The top-level venv_pack gives is:

bin  include  lib  lib64  pyvenv.cfg

This matches what is described here.

The Bazel version top-level is:

venv  venv.runfiles  venv.venv.pth

@mattem
Copy link
Collaborator

mattem commented Aug 17, 2024

The venv doesn't get created until runtime, ie bazel run :venv, rules_py doesn't output the venv at build time.

This is partly due to the pyenv.cfg containing absolute paths. Thefore, the rules here are effectively placing the files required to make the venv into a tar, rather than the venv itself.

There's a PR open to support creating a PEX, if that's what the ultimate goal here is, but in its current form this is not supported and working as expected.

@njlr
Copy link
Author

njlr commented Aug 17, 2024

I understand it now; thanks for your help!

@njlr njlr closed this as completed Aug 17, 2024
@njlr
Copy link
Author

njlr commented Aug 19, 2024

This is partly due to the pyenv.cfg containing absolute paths. Thefore, the rules here are effectively placing the files required to make the venv into a tar, rather than the venv itself.

AFAICT the absolute path is the home setting. If the user were asked to supply this, it would make environments determinsitic, and therefore eligible for creation at build-time?

@mattem
Copy link
Collaborator

mattem commented Aug 19, 2024

home points to the bin directory of the interpreter that created the virtual env, which under bazel is generally the interpreter attached to a toolchain. It therefore contains the install directory hash, eg /private/var/tmp/_bazel_matt/442be3b1483cc77971c0d7bfbd105b2d/. It doesn't make sense to "ask" the user for this in a traditional sense, as supplying the toolchain is how that interpreter is supplied.

This isn't the only build vs runtime trade off, there are others. I'm not saying it's not possible, but it would be an amount of work to allow for it.

@njlr
Copy link
Author

njlr commented Aug 20, 2024

For those stumbling on this...

build.sh

#!/usr/bin/env bash

set -e

echo "BUILDX=$BUILDX"
echo "OUTPUT_PATH=$OUTPUT_PATH"

# Copy everything to remove symlinks
# Docker doesn't like symlinks...
T=$(mktemp -d)
cp -rL . $T

$BUILDX \
  build \
  $T \
  --file \
  $T/path/to/Dockerfile \
  --platform \
  linux/amd64 \
  --output \
  type=local,dest=$OUTPUT_PATH

rm -r $T

BUILD.bazel

filegroup(
  name = "lib_files",
  srcs = [
    "README.md",
    "pyproject.toml",
  ] + glob([
    "foo/**/*.py",
  ]),
)

genrule(
  name = "bundle",
  srcs = [
    "build.sh",
    "Dockerfile",
    ":lib_files",
  ],
  local = True,
  cmd = " && ".join([
    "export OUTPUT_PATH=$@",
    "export BUILDX=$(location :buildx)",
    "$(location build.sh)",
  ]),
  outs = [ "build" ],
  tools = [ ":buildx" ],
)

Dockerfile

FROM public.ecr.aws/amazonlinux/amazonlinux:2023@sha256:60e5a4a26400041379437a65c780e2ce2981a9b50d493f628aabb1623014c25c AS build

RUN <<EOR
set -e
yum update
yum install -y python3 python3-pip
python3 --version
pip3 --version
EOR

RUN <<EOR
set -e
pip3 install virtualenv
pip3 install venv-pack
pip3 install pyclean
EOR

ADD . /opt/app

WORKDIR /opt/app

RUN <<EOR
set -e

export CFLAGS='-g0'
export PATH='bin:/usr/bin:/usr/local/bin'
export PYTHONPYCACHEPREFIX='/root/.cache/pycache'
export PYTHONDONTWRITEBYTECODE='1'
export PYTHONHASHSEED='0'
export SOURCE_DATE_EPOCH='315532800'

cd ./foo
python3 -m venv pyspark_venvsource
source pyspark_venvsource/bin/activate
python3 -m pip install .
pip3 install venv-pack
pip3 install pyclean
pyclean .
mkdir -p /output
python3 -m venv_pack -f -o /output/bundle.tar.gz
EOR

FROM scratch
COPY --from=build /output ./output

Not an ideal solution, but it may unblock you.

@mattem
Copy link
Collaborator

mattem commented Aug 20, 2024

I'd highly recommend not calling docker from a genrule for many reasons, especially one that then has RUN statements.
If the goal is to place the a py_binary into an image, perhaps this example using rules_oci is what you are looking for, https://github.com/aspect-build/bazel-examples/blob/main/oci_python_image/hello_world/BUILD.bazel

@njlr
Copy link
Author

njlr commented Aug 20, 2024

I'd highly recommend not calling docker from a genrule for many reasons, especially one that then has RUN statements. If the goal is to place the a py_binary into an image, perhaps this example using rules_oci is what you are looking for, https://github.com/aspect-build/bazel-examples/blob/main/oci_python_image/hello_world/BUILD.bazel

I would prefer an image but I'm limited to what Spark supports: https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html#python-package-management

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants