Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(datasource/pypi): handle non-normalized package names for pypi simple lookup #30716

Merged
merged 6 commits into from
Aug 14, 2024

Conversation

Shegox
Copy link
Contributor

@Shegox Shegox commented Aug 12, 2024

Changes

This PR revises the logic for extracting the version from extractVersionFromLinkText. The challenge arises because the HTML tag contains both the non-normalized name and the version, making it difficult to discern where the name ends and the version begins. This issue became prominent after PR #27733/v38, where we only have the normalized name available during lookup. Thus, it's necessary to verify if the detected name matches the normalized name.

The approach involves normalizing the tag text and confirming it matches the packageName. Once verified, the package name is removed from the non-normalized tag name using the length of the normalized name to isolate the version. It's essential to note that the normalized name can differ in length, and this discrepancy is accounted for.

Additionally, the normalizeName function has been removed, and we now rely on the centralized implementation of normalizePythonDepName, which adheres to the standard normalization procedure.

We support two formats:

  1. Source distribution: {name}-{version}.tar.gz (specification)
  2. Binary distribution: {distribution}-{version}(-{build tag})?-{python tag}-{abi tag}-{platform tag}.whl (specification)

Both name and distribution correspond to the packageName with - replaced by _. A newer spec replaces . with _. However, since we convert back to the normalized form and transform all three characters _-. to -, the comparison remains unaffected.

It’s also worth noting that package names with multiple instances of .-_ reduced to - are rare. Across all of PyPI, there are only 29 such packages, with most being empty:

$ curl https://pypi.org/simple/ -sS | grep -P '[\.\-_]{2,3}'
<a href="/simple/0-0/">0-._.-._.-._.-._.-._.-._.-0</a>
<a href="/simple/ahmed-m-gamaleldin/">Ahmed-M.-Gamaleldin</a>
<a href="/simple/al-application-launcher/">AL---Application-Launcher</a>
<a href="/simple/cmc-csci046-data-structures/">cmc-csci046-.data-structures</a>
<a href="/simple/cmc-csci046-yilinli-trees/">cmc-csci046-.yilinli-trees</a>
<a href="/simple/empy-electromagnetic-python/">EMpy----ElectroMagnetic-Python</a>
<a href="/simple/e-s-p-hadouken/">E.S.P.-Hadouken</a>
<a href="/simple/example-pkg-testing-megankuoo/">example-pkg.....testing-megankuoo</a>
<a href="/simple/funniest-test2016/">funniest__test2016</a>
<a href="/simple/h-ello-worl-d/">h__ello__worl__d</a>
<a href="/simple/hydrotools-restclient/">hydrotools.-restclient</a>
<a href="/simple/iaf-interaction-framework/">IAF--Interaction-Framework</a>
<a href="/simple/jungle-py-compiler/">Jungle-.Py-Compiler</a>
<a href="/simple/just-a-try-i/">just-a-try--i</a>
<a href="/simple/jy-2019/">jy-.-2019</a>
<a href="/simple/lh-nester/">lh__nester</a>
<a href="/simple/liuhao-nester/">liuhao__nester</a>
<a href="/simple/micropython-ctypes/">micropython-_ctypes</a>
<a href="/simple/micropython-markupbase/">micropython-_markupbase</a>
<a href="/simple/nester-101/">nester_-101</a>
<a href="/simple/pycopy-ctypes/">pycopy-_ctypes</a>
<a href="/simple/pyhed-python-desktop-framework/">pyHed---Python-desktop-framework</a>
<a href="/simple/quick-torrent-downloader/">Quick-.Torrent-Downloader</a>
<a href="/simple/snowflake-legacy/">snowflake._legacy</a>
<a href="/simple/spyce-python-server-pages/">SPYCE---Python-Server-Pages</a>
<a href="/simple/tango-project-algencan/">TANGO-Project---ALGENCAN</a>
<a href="/simple/tzara-a-personal-assistant/">Tzara---A-Personal-Assistant</a>
<a href="/simple/util-q/">util--q</a>
<a href="/simple/vins-server-messenger/">Vins__server_messenger</a>

Context

Fixes #30712, which caused Renovate to fail to lookup pypi packages with special characters (.) in their name.

Documentation (please check one with an [x])

  • I have updated the documentation, or
  • No documentation update is required

How I've tested my work (please select one)

I have verified these changes via:

@Shegox Shegox marked this pull request as draft August 12, 2024 11:26
@Shegox Shegox marked this pull request as ready for review August 12, 2024 13:09
@Shegox
Copy link
Contributor Author

Shegox commented Aug 12, 2024

//cc @not7cd if you have some time I would highly appreciate your feedback/review of the pypi lookup logic.

@Shegox Shegox requested review from viceice and not7cd August 12, 2024 18:24
Copy link
Contributor

@not7cd not7cd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lib/modules/datasource/pypi/index.spec.ts Outdated Show resolved Hide resolved
@viceice viceice added this pull request to the merge queue Aug 14, 2024
Merged via the queue into renovatebot:main with commit 5ff0778 Aug 14, 2024
38 checks passed
@renovate-release
Copy link
Collaborator

🎉 This issue has been resolved in version 38.29.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

zharinov pushed a commit to zharinov/renovate that referenced this pull request Aug 15, 2024
kosmoz pushed a commit to kosmoz/renovate that referenced this pull request Aug 16, 2024
@Shegox Shegox deleted the pypi-30712 branch August 16, 2024 10:18
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

datasource(pypi): simple pypi lookup is not working for packages with normalized names
4 participants