Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reliable way of retrieving license files #441

Closed
abravalheri opened this issue Mar 22, 2023 · 3 comments
Closed

Reliable way of retrieving license files #441

abravalheri opened this issue Mar 22, 2023 · 3 comments

Comments

@abravalheri
Copy link
Contributor

When I need to retrieve the license files from an installed project, I usually go for something like:

licenses = [f for f in importlib_metadata.files("<package>") if f.stem == "<pre-defined file name for 'package'>"]

However, I recently found out that some OSs will remove the RECORD file after installation, this means that files will return None...

With that in mind, I wonder if:

  • would it be possible for files() to least at least the contents of the .dist-info folder even if the RECORD file is deleted?
  • could we have a reliable retrieval mechanism for license files.
    Maybe it does not have to rely on files(), it could use the value of License-File in METADATA...
@jaraco
Copy link
Member

jaraco commented Apr 13, 2023

It's my understanding that the Python Packaging Authority is working on a project to specify license details in a structured form, such as an SPDX entry in the metadata spec. That would be my preferred means of soliciting and advertising the license of a package.

The goal of importlib_metadata is to reflect the best model of what metadata is available for an installed package and to do that in a way that's true to the specifications, lenient to practical concerns, and flexible enough not to constrain non-standard environments (to support arbitrary loaders and finders similar to how Python does for imported modules). I do aim to avoid importlib_metadata creating de facto standards.

  • would it be possible for files() to least at least the contents of the .dist-info folder even if the RECORD file is deleted?

Yes, maybe. The implementation is already getting a little out of hand. The implementation currently returns the result of RECORD, installed-files.txt, or SOURCES.txt. I guess it could additionally fall back to attempting to enumerate files from the dist-info directory, but now there would be another hidden variant of the behavior (sometimes users would get the full file list and other times invisibly only get the metadata files). That all seems undesirable on the whole.

  • could we have a reliable retrieval mechanism for license files.
    Maybe it does not have to rely on files(), it could use the value of License-File in METADATA...

This approach sounds closer to viable. Oh! If License-File is defined in METADATA and the packaging spec indicates that the License-File can be found in the metadata directory, it should be possible to just read it/them:

 ~ $ pip-run setuptools -- -q
>>> import importlib.metadata as md
>>> dist = md.distribution('setuptools')
>>> dist.metadata.get_all('License-File')
['LICENSE']
>>> dist.read_text('LICENSE')[:10]
'Copyright '

Does that provide everything you need?

@jaraco
Copy link
Member

jaraco commented Jun 19, 2023

@abravalheri Does that snippet not illustrate a way to satisfy the need of the reported issue?

@abravalheri
Copy link
Contributor Author

Yes, thank you very much @jaraco. Sorry for the delay in replying.

Probably this solution will work independently from backend once the new PEP is approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants