Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install directly from wheels, without unpacking into an intermediate directory #8562

Merged
merged 20 commits into from
Jul 11, 2020

Conversation

chrahunt
Copy link
Member

@chrahunt chrahunt commented Jul 9, 2020

Over the past several days, we have removed almost all code that requires a wheel to be extracted to a temporary directory before installation. In this last piece, we separate out the file-specific operations from wheel installation and then replace it with code to read directly from the wheel file.

This is a pretty big PR. I have tried to show a clear progression commit-by-commit, along with justifications for individual decisions. Please let me know if I can do anything to make this easier to review!

A few followups will be needed:

  1. We should guard against leading / and path traversals (e.g. ../..), similar to what we do in unpacking.unzip_file.
  2. We will probably need to set the permissions of installed files similar to what we do for generated files.

I think we can handle those in separate PRs, to keep this one focused, but I am also OK if we want to incorporate them in this one. I can work on those shortly regardless.

Closes #6030.

chrahunt added 3 commits July 8, 2020 17:31
Dropping the top-level directory creation allows us to make the
processing completely dependent on files to be installed, and not on the
top-level directory they happen to be installed in.

We already create the parent directory in the loop below, so this call
should be redundant for files that get installed.
By removing this dependency of the "file installation" part of `clobber`
on the "file finding" part of `clobber`, we can more easily factor out
the "file installation" part.
Hiding the file-specific implementation we currently use will let us
trade out the implementation for a zip-backed one later. We can also use
this interface to represent the other kinds of files that we have to
generate as part of wheel installation.

We use a Protocol instead of a base class because there's no need for
shared behavior right now, and using Protocol is less verbose.
"getting files" is one of the places that requires files to be on disk.
By extracting this out of `clobber` we can make it simpler and then
trade it out for a zip-based implementation.
@chrahunt chrahunt force-pushed the extract-direct-from-zip-refactoring branch 2 times, most recently from 1e32645 to c305100 Compare July 9, 2020 02:15
@chrahunt chrahunt added the type: enhancement Improvements to functionality label Jul 9, 2020
@chrahunt chrahunt marked this pull request as ready for review July 9, 2020 03:06
@pradyunsg pradyunsg changed the title Install wheels directly Install directly from wheels, without unpacking into an intermediate directory Jul 9, 2020
Copy link
Member

@xavfernandez xavfernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot 👍

src/pip/_internal/operations/install/wheel.py Outdated Show resolved Hide resolved
src/pip/_internal/utils/misc.py Outdated Show resolved Hide resolved
Copy link
Member

@sbidoul sbidoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement! On one test installing a bunch of wheels from a directory, this saves ~11 seconds out of 53, with another ~11 seconds saved by #8552. That is significant.

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a brilliantly broken up PR that was a breeze to review. Thank you so much for taking the time and making the effort to do so! ^>^

4 minor comments, but the overall PR looks ✨

src/pip/_internal/utils/misc.py Show resolved Hide resolved
src/pip/_internal/operations/install/wheel.py Show resolved Hide resolved
src/pip/_internal/operations/install/wheel.py Outdated Show resolved Hide resolved
src/pip/_internal/operations/install/wheel.py Outdated Show resolved Hide resolved
chrahunt added 15 commits July 9, 2020 18:23
We always pass a file path to this function, so assert as much. We want
the return type to be consistent so we can assign the result to
non-Optional types.
This makes `clobber` much simpler, and aligns the interface of
root_scheme files and data_scheme files, so we can process them in the
same way.
Simplifying the file-finding function will make it easier to drive our
whole wheel installation from a single list of files later.
With this approach, we can add the rest of our generated files into the
same iterable and they can undergo the same processing.
When we start processing files directly from the wheel, all we will have
are the files with their zip path (which should match a `RECORD`
entry). Separating this from the source file path (used for copying)
and annotating it with our `RecordPath` type makes it clear what the
format of this public property is, and that it should match what is in
`RECORD`.
From https://docs.python.org/3/library/itertools.html,
adapted for Python 2 and with types added.

This will be used in the next commit.
At the beginning of our wheel processing we are going to have the list
of contained files. By splitting this into its own function, and
deriving it from disk in the same way it will appear in the zip, we can
incrementally refactor our approach using the same interface that will
be available at that time.

We start with the root-scheme paths (that end up in lib_dir) first.
Now we rely solely on the list of RECORD-like paths derived from the
filesystem, and can easily trade out the implementation for one that
comes from the wheel file directly.
One less dependency on the wheel being extracted.
@chrahunt chrahunt force-pushed the extract-direct-from-zip-refactoring branch from c305100 to 80a2a94 Compare July 9, 2020 23:11
@chrahunt chrahunt force-pushed the extract-direct-from-zip-refactoring branch from 9b7bb4b to d74d6b3 Compare July 10, 2020 01:16
@chrahunt
Copy link
Member Author

chrahunt commented Jul 10, 2020

Just to prevent anyone potentially installing a vulnerable pip when we know there's an easily-fixed issue, I added a commit to address the path traversal issue I mentioned in the original description.

@chrahunt chrahunt force-pushed the extract-direct-from-zip-refactoring branch from d74d6b3 to d13ec25 Compare July 10, 2020 01:21
@chrahunt
Copy link
Member Author

If there are no concerns with the addition I'll merge this in another 12 hours.

@chrahunt chrahunt merged commit 8bf5731 into pypa:master Jul 11, 2020
@chrahunt chrahunt deleted the extract-direct-from-zip-refactoring branch July 11, 2020 01:33
@chrahunt
Copy link
Member Author

Thanks everyone for taking a look!!

@chrahunt
Copy link
Member Author

2\. We will probably need to set the permissions of installed files similar to what we do for [generated files](https://github.com/pypa/pip/blob/2f9b50c097fa5e655cd2978b682fc79ed2e6295f/src/pip/_internal/operations/install/wheel.py#L676).

I just checked and it doesn't impact this code, since we aren't transitively using tempfile as we are for generated files (as described in #8139 (comment)). So there are no more pending items here. :)

def make_data_scheme_file(record_path):
# type: (RecordPath) -> File
normed_path = os.path.normpath(record_path)
_, scheme_key, dest_subpath = normed_path.split(os.path.sep, 2)
Copy link

@cjp256 cjp256 Jul 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused a bit of a regression with a dependency I have.

This dependency incorrectly installed a file to python_apt-0.0.0.data/purelib, rather than creating the purelib directory first. pip install (20.2) now fails with:

ERROR: Exception:
Traceback (most recent call last):
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 216, in _main
    status = self.run(options, args)
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/cli/req_command.py", line 182, in wrapper
    return func(self, options, args)
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 421, in run
    pycompile=options.compile,
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 90, in install_given_reqs
    pycompile=pycompile,
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 831, in install
    requested=self.user_supplied,
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/operations/install/wheel.py", line 830, in install_wheel
    requested=requested,
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/operations/install/wheel.py", line 658, in _install_wheel
    for file in files:
  File "/tmp/venv/lib/python3.6/site-packages/pip/_internal/operations/install/wheel.py", line 587, in make_data_scheme_file
    _, scheme_key, dest_subpath = normed_path.split(os.path.sep, 2)
ValueError: not enough values to unpack (expected 3, got 2)

Previously this would have been ignored. While it's the package's fault, perhaps this could use a little improvement on error handling? I'm willing to put up a PR, but am unsure how this case should be handled.

Thanks! 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the research and creating a separate issue, much appreciated. :)

I submitted #8656 to handle this case in a way that will be hopefully more clear.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Extract wheels directly to install location
6 participants