-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ambiguous version parsing in parse_sdist_filename
#527
Comments
(Another possible solution here is to normalize all of the current distribution releases hosted on PyPI, and emphasize that |
Here's the valid distribution regexp in PEP 508:
So |
I'm not sure I would consider this to be a bug in One thing we could do here is to allow the caller to optionally provide the distribution name they were expecting to be present, so this function could either use this as a hint during parsing, or just raise an exception when what it found isn't what was expected. I'm not sure how useful this would be broadly, though.
Fairly unlikely we would do this, IMO. |
This seems like a reasonable workaround to me, but +1 on the general usefulness concern (it's definitely useful for our needs on |
) * pip_audit/dependency_source: match candidate names against project See: pypa/packaging#527. Fixes #248. * pip_audit/dependency_source: remove redundant `is_satisfied_by` test * test: add tests for vexing sdist parses * test: update comment * CHANGELOG: record fixes * setup: pin `click` Works around psf/black#2964 * setup: add note about pinned click
I have opened pypa/packaging.python.org#1066 to strengthen the spec around sdist file names. That implicitly updates our docs since we reference it for what the function can parse. |
For some additional information. pip is able to install the package because the Simple API also provides the project name in the URL (for cffi, the endpoint is We should probably loop some of the cffi maintainers to inform them about this. |
Yep, that's what we ended up doing: pypa/pip-audit#249 That suggestion otherwise sounds similar to what @di suggested in #527 (comment), which sounds good to me! I'm happy to make those changes in a PR if other consumers would find them useful 🙂 |
parse_sdist_filename
parse_sdist_filename
Updated the title to emphasize that this is an ambiguity, not a spec violation per se. |
Related: the ambiguous parse here could also probably be avoided by enforcing the current guidance on sdist filenames, which suggests that distribution names be normalized from |
Yep! I was just reading that PEP 🙂. Right now it says that distributions are normalized according to 503, which defers to 426, which means that they can still contain (But also NB that it's not currently an issue, since all recent sdists are being published with normalized versions. This would strictly be a "cherry on top" in terms of reducing ambiguity.) |
You missed a part:
Specifically:
So the names get normalized as per PEP 503, which will convert any runs of non alphanum characters into a single |
TBH, PEP 503 normalization really only means the name gets lowercased, since PEP 427 implements the exact same (effectively anyways) normalization as PEP 503 does, just uses |
Whoops, I also missed 427. Just to clarify:
This only applies to wheels, not sdists, correct? Experimentally PyPI does not currently normalize (That's where the ambiguity currently appears to me. The current PyPA spec says that that normalization does occur for sdists, whereas it doesn't appear to.) |
PEP 427 only applies to wheels, and PEP 625 says “we’ll use the same rules for sdist”. |
That's this language, right?
If so, that would mean a format like this, correct?
If so, that all makes sense so me, and that would be a good improvement 🙂 (This is a significant tangent at this point, but I wonder if we couldn't avoid all of this ad-hoc parsing entirely by updating PEP 503 to include more |
Hopefully all these would go away when we finally standardise the JSON API. Regarding the PEP… happy to hear it all make sense for you, unfortunately not many people are entirely in agreement with the proposal (read the Discourse thread linked in the PEP). |
Would a |
I think it'd be useful for 625, although the current sdist filename format is probably foregone (since PyPI and other hosts don't do that escaping and lots of packages are already hosted with |
For old sdists, yes, but there's nothing saying that naming can't be enforced going forward to slowly update the community through attrition. |
I opened #542 for |
I would have to think about it some, but there's a high chance that we could just force normalize sdist names on upload in Warehouse. There's a lower chance, but still reasonable, that we could rename existing sdists to be normalized. You could open a Warehouse ticket about it to figure it out. |
Yep, I can open that ticket in a bit. |
FWIW, I was reading the write up about this, and FWIW I do in fact think that this is a bug with the current Unfortunately The sdist page on packaging.python.org suggests that canonicalizing the name and version is a de facto standard, which does not appear to actually be true? Looking through a number of projects, I don't see hardly any canonicalizing really happening. While I agree that we should be, at a minimum, escaping Unfortunately I don't have a great answer here, Having
Which doesn't actually exist for sdist filenames. |
I think you're right, so we should probably update the docs to say that it implements PEP 625 even though it's not accepted.
I think when we accepted the function we didn't realize how many nasty edge cases that are all technically legal we had missed, or that they are in any way widespread. Maybe this is more motivation to get PEP 625 accepted. 😁 |
Yea, it happens. packaging is full of them :P (though increasingly less of them!) I think documenting that it implements PEP 625, possibly with warning that PEP 625 isn't accepted or enforced anywhere yet, so that there are edge cases where it will not correctly parse otherwise currently valid names to warn people is probably a reasonable option. It's also probably the best option for what to do, since it's somewhat silly to just egregiously break compatibility for an edge case, particularly one that I think is somewhat hard to trigger in practice using modern tooling (IIRC setuptools normalizes the version by default anyways, and the cffi version that actually broke this, isn't a valid package for other reasons). |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
FYI https://peps.python.org/pep-0625/ has been accepted! |
Do we want this function to serve as a validating parser for PEP 625 (i.e. checking that the distribution is canonically named, checking that the version is normalised) or just a plain "split and call Version"? |
I'm personally fine with following PEP 625 since sdists are otherwise just a "good luck to you" situation. |
…249) * pip_audit/dependency_source: match candidate names against project See: pypa/packaging#527. Fixes #248. * pip_audit/dependency_source: remove redundant `is_satisfied_by` test * test: add tests for vexing sdist parses * test: update comment * CHANGELOG: record fixes * setup: pin `click` Works around psf/black#2964 * setup: add note about pinned click
…249) * pip_audit/dependency_source: match candidate names against project See: pypa/packaging#527. Fixes #248. * pip_audit/dependency_source: remove redundant `is_satisfied_by` test * test: add tests for vexing sdist parses * test: update comment * CHANGELOG: record fixes * setup: pin `click` Works around psf/black#2964 * setup: add note about pinned click
Hi there! First of all, thanks for continuing to maintain this package -- it's extremely useful 🙂
I'm one of the maintainers of
pip-audit
, and we had a user report some strange dependency resolution behavior: pypa/pip-audit#248We were able to root-cause the bug down to a release of
cffi
(1.0.2-2
) that uses the implicit post releases syntax for specifying the post-release number, rather than the canonicalizedpostN
format. This release ofcffi
is published on PyPI here, without canonicalization, so it's likely that it was uploaded before PyPI began normalizing versions.Because
1.0.2-2
contains a dash, the following body ofpackaging.utils.parse_sdist_filename
contains an incorrect assumption and parses the source distribution name incorrectly:yielding:
whereas we expected:
TL;DR:
parse_sdist_filename
shouldn't rely on the last dash as a separator between the distribution name and the version, since PEP 440 allows dashes in non-normalized versions. Parsing this correctly poses a bit of a challenge, since distribution names can also contain dashes and numbers and might even contain them in pathological ways, such as:The text was updated successfully, but these errors were encountered: