Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use better temporary files mechanism in install_unpacked_wheel #7703

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions NEWS.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,69 @@

.. towncrier release notes start

20.1.dev0 (2020-02-05)
======================



20.1.dev0 (2020-02-05)
======================



20.1.dev0 (2020-02-05)
======================



20.1.dev0 (2020-02-05)
======================

No significant changes.


20.1.dev0 (2020-02-05)
======================

Deprecations and Removals
-------------------------

- Remove unused ``tests/scripts/test_all_pip.py`` test script and the ``tests/scripts`` folder. (`#7680 <https://github.com/pypa/pip/issues/7680>`_)

Vendored Libraries
------------------

- Update semi-supported debundling script to reflect that appdirs is vendored.


20.1.dev0 (2020-02-05)
======================

Deprecations and Removals
-------------------------

- Remove unused ``tests/scripts/test_all_pip.py`` test script and the ``tests/scripts`` folder. (`#7680 <https://github.com/pypa/pip/issues/7680>`_)

Vendored Libraries
------------------

- Update semi-supported debundling script to reflect that appdirs is vendored.


20.1.dev0 (2020-02-05)
======================

Deprecations and Removals
-------------------------

- Remove unused ``tests/scripts/test_all_pip.py`` test script and the ``tests/scripts`` folder. (`#7680 <https://github.com/pypa/pip/issues/7680>`_)

Vendored Libraries
------------------

- Update semi-supported debundling script to reflect that appdirs is vendored.


20.0.2 (2020-01-24)
===================

Expand Down
1 change: 0 additions & 1 deletion news/7680.removal

This file was deleted.

1 change: 0 additions & 1 deletion news/7690.vendor

This file was deleted.

Empty file.
Empty file.
Empty file.
6 changes: 3 additions & 3 deletions src/pip/_internal/operations/install/wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@

from pip._internal.exceptions import InstallationError
from pip._internal.locations import get_major_minor_version
from pip._internal.utils.filesystem import adjacent_tmp_file, replace
from pip._internal.utils.misc import captured_stdout, ensure_dir, hash_file
from pip._internal.utils.temp_dir import TempDirectory
from pip._internal.utils.typing import MYPY_CHECK_RUNNING
Expand Down Expand Up @@ -575,9 +576,8 @@ def is_entrypoint_wrapper(name):

# Record details of all files installed
record = os.path.join(dest_info_dir, 'RECORD')
temp_record = os.path.join(dest_info_dir, 'RECORD.pip')
with open_for_csv(record, 'r') as record_in:
with open_for_csv(temp_record, 'w+') as record_out:
with adjacent_tmp_file(record, mode='w') as record_out:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to change w+ to w?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we sure that adjacent_tmp_file is equivalent to open_for_csv for our purpose here? (I've not investigated myself yet)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to change w+ to w?

We are not reading anything from this file so why request the extra permissions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we sure that adjacent_tmp_file is equivalent to open_for_csv for our purpose here? (I've not investigated myself yet)

All open_for_csv does is append some parameters to open() to account for differences between different versions of Python.

adjacent_tmp_file calls NamedTemporaryFile which in turn calls _sanitize_params, calls close() on exit (in addition to that of file) and ensures file is written to disk before it's closed.

Tests would not pass if binary mode was used even with Python 2 but they do otherwise (on both versions). So I do not see any obstacles to replacing open_for_csv.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the open parameters correct for CSV files is fairly subtle, and having open_for_csv is important to make sure it's easy for people to get it correct even if they aren't aware of platform differences. Having text mode and newline='' is crucial for correct handling on Windows under Python 3 (and having binary mode is crucial on Python 2). So I'd be concerned that the switch to adjacent_tmp_file is generating a CSV file with incorrect line endings. Do we know for sure that we have a test that checks this?

Note: I don't actually know (without doing some research) the precise "rules"1 for the format - but I do know that the docs in the CSV module (in particular the footnote here) were arrived at after quite a lot of frustrating investigation into weird bugs.

1 CSV is not a particularly well defined format, so they aren't actually so much rules as "hard-won knowledge about what works" 🙄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the link to the newline comment and this is addressed by the io module internally. I do not see where it says Python 2 insists on binary mode. A simple test shows:

Python 2.7.10 (default, Feb 22 2019, 21:17:52) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> with open('/tmp/bla', 'w') as f:
...     c = csv.writer(f)
...     c.writerow(['hello'])
... 
>>> with open('/tmp/bla', 'r') as f:
...     c = csv.reader(f)
...     print '\n'.join(','.join(l) for l in c)
... 
hello
>>> 

Binary also works:

>>> with open('/tmp/bla', 'rb') as f:
...     c = csv.reader(f)
...     print '\n'.join(','.join(l) for l in c)
... 
hello

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per documentation for open():

Thus, when opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don’t treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.

Since we only ever deal with text data in CSVs there is no reason why it should matter. All Python documentation uses b everywhere because in Python 2 it essentially had no adverse effect. It was required for some systems and harmless for others.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/2/library/csv.html#csv.reader - "If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference". You're running on MacOS, which is a platform where it doesn't make a difference. SO it's not surprising you can't demonstrate an issue (your specific test does work on Windows, but again, that just means I suspect your test is incomplete - I have had actual, real-life errors caused by failing to use binary mode on Windows).

Anyhow, regardless of how much we debate test cases that demonstrate one thing or another, that's irrelevant, We should follow the documented requirements of the CSV module, not waste time debating whether they are needed. If you feel sufficiently strongly, by all means raise a CPython bug that claims the requirements in the docs are obsolete and should be removed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then I can extract the version -> mode part from open_for_csv and call it there in addition to adjacent_tmp_file(file, 'w' + get_mode()).

Copy link
Member

@chrahunt chrahunt Feb 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth I did this in our recent wheel builder helper test code, which was the easiest way I could find that was compatible.

This would also avoid needing to make changes to adjacent_tmp_file.

reader = csv.reader(record_in)
outrows = get_csv_rows_for_installed(
reader, installed=installed, changed=changed,
Expand All @@ -587,7 +587,7 @@ def is_entrypoint_wrapper(name):
# Sort to simplify testing.
for row in sorted_outrows(outrows):
writer.writerow(row)
shutil.move(temp_record, record)
replace(record_out.name, record)


def install_wheel(
Expand Down
9 changes: 5 additions & 4 deletions src/pip/_internal/utils/filesystem.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,12 +84,13 @@ def is_socket(path):


@contextmanager
def adjacent_tmp_file(path):
# type: (str) -> Iterator[NamedTemporaryFileResult]
"""Given a path to a file, open a temp file next to it securely and ensure
it is written to disk after the context reaches its end.
def adjacent_tmp_file(path, mode='w+b'):
# type: (str, str) -> Iterator[NamedTemporaryFileResult]
"""Given a path to a file and a mode, open a temp file next to it securely
and ensure it is written to disk after the context reaches its end.
"""
with NamedTemporaryFile(
mode=mode,
delete=False,
dir=os.path.dirname(path),
prefix=os.path.basename(path),
Expand Down