Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do basic syntax checks on the RECORD file when installing a wheel #11762

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

SpecLad
Copy link
Contributor

@SpecLad SpecLad commented Jan 29, 2023

For now, only check requirements specified in the original RECORD specification (https://packaging.python.org/en/latest/specifications/recording-installed-packages/#the-record-file), without the added constraints from the wheel specification.

Some of the warnings ("invalid digest size" and "duplicate line") should really be errors, but I was able to find some popular projects that have these problems (any projects built with sphinx-theme-builder for the former, zeroconf for the latter). To avoid breaking stuff, keep these as warnings for now.

Fixes #6198
Fixes #5913
Un-fixes #6165 (the underlying issue in wheel has been fixed 3 years ago, so the workaround should no longer be needed)

This function made sense when it still had different logic based on
Python 2/3. Nowadays, it's just unnecessary.
@SpecLad SpecLad force-pushed the wheel-record-checks branch from 76698a8 to a521345 Compare January 29, 2023 12:27
@SpecLad
Copy link
Contributor Author

SpecLad commented Jan 29, 2023

To minimize potential breakage, I downloaded one wheel for each of the most popular 5000 PyPI packages (4399 wheels in total, because some packages had no wheels), and attempted to parse the RECORD file in each of them using the parse_record function from this PR. There were no hard errors, and the warnings were as follows:

For now, only check requirements specified in the original RECORD
specification (<https://packaging.python.org/en/latest/specifications/recording-installed-packages/#the-record-file>),
without the added constraints from the wheel specification.

Some of the warnings ("invalid digest size" and "duplicate line") should
really be errors, but I was able to find some popular projects that have
these problems (any projects built with sphinx-theme-builder for the former,
zeroconf for the latter). To avoid breaking stuff, keep these as warnings
for now.
@SpecLad SpecLad force-pushed the wheel-record-checks branch from a521345 to 6a8d153 Compare January 29, 2023 12:53
@pradyunsg
Copy link
Member

Hehe, I'm pretty sure some of this is my fault. ^>^

I'll take a closer look at this sometime next week, but I wanna explicitly say: Thank you for doing this and filing this PR!

@uranusjr
Copy link
Member

Is there similar logic (the record model part) in installer? I wonder if this is a good chance to vendor the lib in.

@pradyunsg
Copy link
Member

Is there similar logic (the record model part) in installer?

Yes.

I wonder if this is a good chance to vendor the lib in.

We should -- although that'd be blocked on pypa/installer#153 and pypa/installer#156.

@uranusjr
Copy link
Member

I’m mostly thinking we can probably reuse stuff in installer that don’t exist in pip (hash-checking etc), pip logic going into installer doesn’t need to block that.

@SpecLad
Copy link
Contributor Author

SpecLad commented Jan 30, 2023

I looked at the RECORD checking logic in installer, and it's a lot more lax than what I have implemented here. It does not reject or warn about any of these test cases:

'a.py,foobar=abcde,'
'a.py,sha256=~!@#$%,'
'a.py,sha256=abcde,'
'a.py,,-1'
'a.py,sha256=4OHi4-Tl5ufo6err7O3u7_Dx8vP09fb3-Pn6-_z9_v8=,'
'a.py,sha256=4OHi4+Tl5ufo6err7O3u7/Dx8vP09fb3+Pn6+/z9/v8,'
'a.py,sha256=abcdef,'
'a.py,,123\na.py,,456'
'a.py,,123\na.py,,123'

@pradyunsg
Copy link
Member

pradyunsg commented Jan 30, 2023

Are you calling validate_record (it's on the main branch) or record.validate(data=...)?

@SpecLad
Copy link
Contributor Author

SpecLad commented Feb 1, 2023

Are you calling validate_record (it's on the main branch) or record.validate(data=...)?

No, because my patch only checks the RECORD file itself, while validate_record checks the data in the wheel, so it wouldn't be a fair comparison.

Now, most of the aforementioned test cases would fail if you were to validate them against wheel contents, but I think it's a good idea to check the syntax before trying to validate the contents, since in that way you can fail faster and give better error messages.

@SpecLad
Copy link
Contributor Author

SpecLad commented Feb 5, 2023

So how do we move forward from here? Is there anything you'd like me to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants