Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Package URL specifications for CPAN Packages #8

Open
giterlizzi opened this issue Nov 9, 2023 · 20 comments
Open

[Draft] Package URL specifications for CPAN Packages #8

giterlizzi opened this issue Nov 9, 2023 · 20 comments
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed

Comments

@giterlizzi
Copy link
Owner

giterlizzi commented Nov 9, 2023

Package URL

A Package URL (aka "purl") is a URL string used to identify and locate a software
package in a mostly universal and uniform way across programing languages,
package managers, packaging conventions, tools, APIs and databases.

https://github.com/package-url/purl-spec

A purl is a URL composed of seven components:

scheme:type/namespace/name@version?qualifiers#subpath

Components are separated by a specific character for unambiguous parsing.

The defintion for each components is:

  • scheme: this is the URL scheme with the constant value of "pkg".
    One of the primary reason for this single scheme is to facilitate the future
    official registration of the "pkg" scheme for package URLs. Required.
  • type: the package "type" or package "protocol" such as maven, npm,
    nuget, gem, pypi, etc. Required.
  • namespace: some name prefix such as a Maven groupid, a Docker image
    owner, a GitHub user or organization. Optional and type-specific.
  • name: the name of the package. Required.
  • version: the version of the package. Optional.
  • qualifiers: extra qualifying data for a package such as an OS,
    architecture, a distro, etc. Optional and type-specific.
  • subpath: extra subpath within a package, relative to the package root.
    Optional.

Package URL for CPAN Packages

Components

Minimal components:

  • The type for CPAN Perl packages and ditribution is cpan
  • The name is the module or distribution name and is case sensitive

Optional (but advised) components:

  • The namespace is the author name. It is must be uppercased
  • The version is the package or distribution version

Qualifiers

Optional qualifiers may include:

  • repository_url, CPAN/MetaCPAN/BackPAN/DarkPAN repository base URL (default is https://www.cpan.org)
  • download_url, URL of package or distibution
  • vcs_url, extra URL for a package version control system
  • ext, file extension (default is tar.gz)

Extras

  • The default respository is https://www.cpan.org
  • To search CPAN it is recommended to use https://metacpan.org

Examples

Minimal "purl" string:

pkg:cpan/libwww-perl
pkg:cpan/[email protected]
pkg:cpan/[email protected]

"purl" string with namespace (author) component:

pkg:cpan/GDT/[email protected]
pkg:cpan/SRI/[email protected]

"purl" string with repository_url qualifier:

pkg:cpan/SRI/[email protected]?repository_url=backpan.perl.org

"purl" string with vcs_url qualifier:

pkg:cpan/GDT/[email protected]?vcs_url=git://github.com/giterlizzi/perl-packageurl.git
@giterlizzi giterlizzi added documentation Improvements or additions to documentation help wanted Extra attention is needed labels Nov 9, 2023
@sjn
Copy link

sjn commented Dec 12, 2023

Hei!

I've been mulling about this ticket a while now; and here are a couple thoughts for your consideration. Please note that some of this is produced from memory, so it's possible that I may be mistaken on some points – please correct me if you find something wrong! (thank you 😁)

PURL usage scenarios

  1. PURLs used to specify any dependency requirements (as opposed to dependency resolutions), including alternative PURLs for the same dependency made available in different packaging ecosystems. (The implementation of this isn't URI::PackageURL-specific, though)
    1. E.g. the following should eventually be possible: cpanm pkg:cpan/SRI/Mojolicious
    2. This means that a PURL should by default be resolvable to the common case, and that common-case URLs should be possible to be converted into correctly corresponding PURLs.
  2. PURLs used to specify dependency resolutions (as opposed to requirements), but limited to what was actually deployed, pinned or packaged.
    1. This means that it should be possible for a PURL to contain all necessary information necessary to correctly resolve to the package that was actually downloaded.
  3. The PURLs may refer to internal/private package indexes or repositories, including company CPAN mirrors, internal APT or RPM repositories, or other off-limit download locations, that are supported by the relevant tooling.
    1. Correspondingly, it should be possible to create a correct PURL from an internal download location like these.

Terminology

  1. Within the CPAN space, the following is a distribution name – SRI/Mojolicious-9.35.tar.gz
    1. A distribution name must contain the author's CPAN id, since it's possible for different people to make releases for the same distribution (!).
  2. The following is a module name – Mojo::Base
    1. A distribution contains one or more modules, but not necessarily in the same namespace as indicated by the distribution name.
    2. Proposal: When referring to a module, the PackageURL must use the keyword module in the namespace part of the PURL. This is to avoid namespace collision between CPAN id's and module names: pkg:cpan/module/Foo::Bar
  3. The naming resolution from module to distribution, is indexed in the 02packages.details.txt files on your mirror.
    1. This resolution is expected to be managed by the tooling used for downloading, unpacking, preprocessing, building, testing, and installing. E.g. cpan, cpanm, cpm and cpanp. Some tooling uses these indirectly, e.g. carton, carmel and dh-make-perl. Or even from CPAN mirror software like Pinto or CPAN::Mini or App::opan.
  4. When we refer to a 'package' we mean the module namespace specifically - even if it is defined in a file which doesn't match the module name.

SBOM Use

  1. After module name resolution:
    1. The module files that are installed from a distribution, are "stored" (lol) in .packlist and perllocal.pod files throughout the designated installation tree. These are less than ideal for figuring out the pedigree of an installed module.
  2. If a distribution is "installed" into a directory destined for inclusion into another packaging ecosystem (e.g. a dir that becomes part of a .dpkg package used by APT), it's common to just delete these files.
  3. With the new demands for SBOM files, we should expect that one SBOM file per distribution will be made, and stored somewhere. (At the time of this writing, this is unclear).

Sources

(Updated 2024-01-19)

@sjn
Copy link

sjn commented Dec 19, 2023

Related, NIST has published a Software Identification Ecosystem Option Analysis where they talk a little about the contexts where PackageURLs may be used. Very useful reflections, and recommended reading.

They specifically look for something they call "Grouping", which they for some reason claim is a "missing feature" in purls. (I may have misunderstood something here).

Not sure of it's relevancy for this module either, but the idea is out there, so possibly necessary to consider.

@sjn
Copy link

sjn commented Jan 19, 2024

Having thought a little more about this, I'm currently considering the following proposals....

  1. Since PackageURLs have at least two distinct "purposes", that would benefit from having separate API methods.
    1. A method for producing a "fully resolved" PURL, that is to be used to uniquely identify a specific package that has been used. This should include as much information as possible, including hostname/repository URL used when downloading a package, it's resolved version, and if possible, a sha256 checksum of the package. This same PURL should be possible to resolve to a valid download URL that the user can use to confirm that the package downloaded is (still) the same as the one published.
      1. The "fully resolved" PURL must be in the form of pkg:cpan/AUTHOR/[email protected]?repo_url=…etc.
    2. A method for producing a "minimal" PURL, that is to be used for referring to CPAN package dependencies before they are resolved during a build stage.
      1. The "minimal" PURL may refer to a CPAN Distribution name pkg:cpan/AUTHOR/Foo-Bar OR a CPAN Module name pkg:cpan/module/Foo::Bar, at the package author's discretion.
      2. When referring to a module name, the PURL must have the word "module" in the namespace field, in order to distinguish between modules that are all uppercase (e.g. CGI) and CPAN author ids that are identical published module names.
  2. When package URLs are resolved, we should expect the client software to allow for any number of Package URLs to the same component to be listed as a dependency, and filter and pick the right one as needed.
    1. e.g. If a CPAN Distro depends on Foo::Bar, it may list the following dependencies, the build tool may shell out the task of installation to any viable alternative, depending on preference or policy. (e.g. by having a --prefer-cpan parameter to have the tool prioritizing downloading dependencies from CPAN, instead of shelling out to apt install libfoo-bar-perl on a Debian system)
      • pkg:cpan/module/Foo::Bar
      • pkg:apt/debian/libfoo-bar-perl
      • pkg:rpm/opensuse/foo-bar-perl

I guess I'm pretty much echoing what you've already have proposed, with the difference of explicitly adding "module" (in lowercase) to the PURL, to make it easily distinguishable from distribution names, which have to be in uppercase; And making a point out of having separate API methods that produce each of these explicitly.

So, with this I've been trying to think about about it from an "independent" starting point, and basically ended up where you and @mrdvt92 in #2 have arrived.

So for whatever it's worth, I'm happy to stand behind what's here, plus the perspectives in #2. 😺

@sjn
Copy link

sjn commented Jan 31, 2024

@giterlizzi, I just learned that the PackageURL spec author is working on getting it registered as an ECMA standard. Maybe it's time to get the CPAN bits included?

source: https://youtu.be/B2bVaaeqpAk?si=c7cdfDZCEJkucOic&t=623

@sjn
Copy link

sjn commented Jan 31, 2024

By the way!

When in comes to specifying (pre-resolution) dependencies, there's a version-range spec for purl. Should we adopt this at the same time, while we're at it?

https://github.com/package-url/purl-spec/tree/version-range-spec

@giterlizzi
Copy link
Owner Author

Maybe it's time to get the CPAN bits included?

Yes, I think we can start validating the specification described in the first comment (Components and Qualifiers) and open a PR to include it in https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

@sjn
Copy link

sjn commented Feb 1, 2024

Apparently, there's a pull request open already at package-url/purl-spec#155 - maybe worth updating?

Also, I expect to meet the purl author, Philippe Ombredanne, in Brussels tomorrow. If you want, I can ask him what's needed to get this PR merged?

@giterlizzi
Copy link
Owner Author

Apparently, there's a pull request open already at package-url/purl-spec#155 - maybe worth updating?

If you agree I would modify it like this:

cpan

cpan for CPAN Perl packages:

  • The default respository is https://www.cpan.org/.

  • To search CPAN it is recommended to use https://metacpan.org.

  • The namespace is optional; it may be used to specify the author name and it must be uppercased.

  • The name is the module or distribution name and is case sensitive.

  • The version is the module or distribution version.

  • Optional qualifiers may include:

    • repository_url: CPAN/MetaCPAN/BackPAN/DarkPAN repository base URL (default is https://www.cpan.org)
    • download_url: URL of package or distibution
    • vcs_url: extra URL for a package version control system
    • ext: file extension (default is tar.gz)
  • Examples::

    pkg:cpan/[email protected]
    pkg:cpan/[email protected]
    pkg:cpan/GDT/[email protected]
    pkg:cpan/LWP::[email protected]
    pkg:cpan/OALDERS/[email protected]
    

Also, I expect to meet the purl author, Philippe Ombredanne, in Brussels tomorrow. If you want, I can ask him what's needed to get this PR merged?

It would be great. Thank you!

@mrdvt92
Copy link

mrdvt92 commented Feb 1, 2024

The name is the module or distribution name and is case sensitive.

The more I think about it, I believe only CPAN distributions should be supported and not modules or packages.

  1. A module (a .pm file) does not have a 1:1 relationship to a package. A module is a single file with zero or more packages inside it.
  2. A single module can be provided by multiple distributions.
  3. A package version does not have to be updated for each distribution.
  4. Modules do not technically have versions. A package can have a version but doesn't have to have a version.

I propose to only use /dist/ to match the meta URL e.g., https://metacpan.org/dist/Perl-Version

pkg:cpan/dist/[email protected]

If we really must use modules, does each module in a distribution need to be specified?
Since modules don't really have versions, are checksum=sha:XXXXXX signature mandatory?

@sjn
Copy link

sjn commented Feb 1, 2024

  • The namespace is optional; it may be used to specify the author name and it must be uppercased.

Aaah, no, let's NOT word it like this. Instead, I propose this -

  • To refer to a CPAN distribution name, the namespace MUST be present. In this case, the namespace is the CPAN id of the author/publisher. It MUST be written uppercase, followed by '/' and then followed by the distribution name. A distribution name may NEVER contain the string '::'.
  • To refer to a CPAN module, the namespace MUST be absent. The module name MAY contain zero or more '::' strings, and the Module name MUST NOT contain a '-'

Correct examples:

pkg:cpan/Perl::[email protected]
pkg:cpan/DROLSKY/[email protected] (distribution name)
pkg:cpan/[email protected] (module name)
pkg:cpan/GDT/URI-PackageURL
pkg:cpan/LWP::UserAgent
pkg:cpan/OALDERS/[email protected]
pkg:cpan/URI (module name)

Incorrect syntax examples:

pkg:cpan/[email protected]
pkg:cpan/[email protected]
pkg:cpan/GDT/URI::PackageURL
pkg:cpan/LWP-UserAgent
pkg:cpan/OALDERS/

@sjn
Copy link

sjn commented Feb 1, 2024

pkg:cpan/dist/[email protected]

If we really must use modules, does each module in a distribution need to be specified?

Modules do have versions (see https://www.cpan.org/modules/02packges.details.txt for documentation)
When using a PackageURL to refer to a module, the intention is to a ecosystem-specific tool to resolve which distribution a specific module belongs to. This is already what happens when running cpanm Foo::Bar – the tool downloads 02packages.details.txt and does a lookup there to figure out which distribution to download. This lookup works with packages (defined as namespaces, of which you may have one or more off inside a .pm file) and with modules (defined as a .pm file with a single package namespace matching the file name), and distributions (a tarball containing one or more modules or packages).

Note also that a distribution name MUST contain the author's CPAN id to be valid! That's why I'm insisting that a PackageURL referring to a dist also must live up to this. (The reason why this is so, is that it's possible for several authors to make releases for the same distribution, and allow users later to refer to which of them they want)

@giterlizzi
Copy link
Owner Author

  • The namespace is optional; it may be used to specify the author name and it must be uppercased.

Aaah, no, let's NOT word it like this. Instead, I propose this -

* To refer to a CPAN distribution name, the namespace MUST be present. In this case, the namespace is the CPAN id of the author/publisher. It MUST be written uppercase, followed by '/' and then followed by the distribution name. A distribution name may NEVER contain the string '::'.

* To refer to a CPAN module, the namespace MUST be absent. The module name MAY contain zero or more '::' strings, and the Module name MUST NOT contain a '-'

Correct examples:

pkg:cpan/Perl::[email protected]
pkg:cpan/DROLSKY/[email protected] (distribution name)
pkg:cpan/[email protected] (module name)
pkg:cpan/GDT/URI-PackageURL
pkg:cpan/LWP::UserAgent
pkg:cpan/OALDERS/[email protected]
pkg:cpan/URI (module name)

Incorrect syntax examples:

pkg:cpan/[email protected]
pkg:cpan/[email protected]
pkg:cpan/GDT/URI::PackageURL
pkg:cpan/LWP-UserAgent
pkg:cpan/OALDERS/

I agree !

giterlizzi added a commit that referenced this issue Feb 2, 2024
@giterlizzi
Copy link
Owner Author

@sjn Have added a initial check for "cpan" purl type

purl-tool pkg:cpan/GDT/URI::PackageURL
ERROR: Invalid Package URL: CPAN 'name' must have the distribution name

purl-tool pkg:cpan/URI-PackageURL
ERROR: Invalid Package URL: CPAN 'name' must have the module name

purl-tool pkg:cpan/G::DT/URI::PackageURL
ERROR: Invalid Package URL: CPAN 'namespace' must have the distribution author

@sjn
Copy link

sjn commented Feb 2, 2024

If we can get a purl-spec PR for this made, we can have it merged lunchtime today! 🤩

@giterlizzi
Copy link
Owner Author

If we can get a purl-spec PR for this made, we can have it merged lunchtime today! 🤩

😃

Changed the specification.


cpan

cpan for CPAN Perl packages:

  • The default respository is https://www.cpan.org/.

  • To search CPAN it is recommended to use https://metacpan.org.

  • The namespace:

    • To refer to a CPAN distribution name, the namespace MUST be present. In this case, the namespace is the CPAN id of the author/publisher. It MUST be written uppercase, followed by the distribution name in the name component. A distribution name may NEVER contain the string ::.
    • To refer to a CPAN module, the namespace MUST be absent. The module name MAY contain zero or more :: strings, and the module name MUST NOT contain a -
  • The name is the module or distribution name and is case sensitive.

  • The version is the module or distribution version.

  • Optional qualifiers may include:

    • repository_url: CPAN/MetaCPAN/BackPAN/DarkPAN repository base URL (default is https://www.cpan.org)
    • download_url: URL of package or distibution
    • vcs_url: extra URL for a package version control system
    • ext: file extension (default is tar.gz)
  • Examples::

    pkg:cpan/Perl::[email protected]
    pkg:cpan/DROLSKY/[email protected]
    pkg:cpan/[email protected]
    pkg:cpan/GDT/URI-PackageURL
    pkg:cpan/LWP::UserAgent
    pkg:cpan/OALDERS/[email protected]
    pkg:cpan/URI
    

@sjn
Copy link

sjn commented Feb 2, 2024

Great! Do you have a PR link I can refer to?

@giterlizzi
Copy link
Owner Author

This is the new PR package-url/purl-spec#288

@sjn
Copy link

sjn commented Feb 2, 2024

One question;

Is it really necessary to mention MetaCPAN at all?

@giterlizzi
Copy link
Owner Author

One question;

Is it really necessary to mention MetaCPAN at all?

You mean this ?

To search CPAN it is recommended to use https://metacpan.org.

@sjn
Copy link

sjn commented Feb 5, 2024

Congratulations with getting this merged into the spec! :-D

Now the work starts with getting purls supported in other parts of the Perl/CPAN toolchain!

(btw, I've tried to reach out to you on twitter/x; are there better channels for reaching you?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants