Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add support for requirements in the form of PackageURLs (purls) #40

Open
sjn opened this issue May 14, 2024 · 23 comments
Open

Comments

@sjn
Copy link

sjn commented May 14, 2024

Hei!

I'd like to propose to add support to specifying requirements in the form of PackageURLs (purls), in work in addition to the existing ways (using dist/module names).

With this, I'm hoping that we can get a step closer to supporting requirements that work across ecosystem boundaries.

e.g. the following...

prereqs => {
  runtime => {
    requires => {
      'CPAN::Meta::Requirements' => '0.102',
      'Library::Foo' => '>= 1.208, <= 2.206',
      'Module::Bar'  => '>= v1.2.3, != v1.2.8',
      'Xyzzy'        => '== 6.01',
      'Module::Foo'  => '1.0',
    },
  },
}

...could be written as...

prereqs => {
  runtime => {
    requires => {
      'pkg:cpan/CPAN::Meta::Requirements' => 'vers:cpan/0.102', # resolves to same as above
      'pkg:cpan/Library::Foo' => 'vers:cpan/>=1.208|<=2.206',   # resolves to same as above
      'pkg:cpan/Module::Bar'  => 'vers:cpan/>=v1.2.3|!=v1.2.8', # resolves to same as above
      'pkg:cpan/Xyzzy'        => 'vers:cpan/==6.01',            # resolves to same as above
      'Module::Foo'           => '1.0',                         # old way continues to work
    },
  },
}

...and while this is fine, this also opens for a bunch of really cool new things!

prereqs => {
  develop => {
    requires => {
      'Dist::Zilla' => 0,
      'pkg:github/twbs/bootstrap' => 'vers:github/>5.0', # we embed bootstrap.js in this dist, so let's specify that it's a dep
    },
  },
  configure => {
    requires => {
      'pkg:deb/ubuntu/xz-utils' => 'vers:deb/>=4.0|!=5.6.1|!=5.6.2', # depend on xz-utils, but don't want vulnerable releases
    },
  },
  build => {
    requires => {
      'pkg:deb/ubuntu/libmysqlclient-dev' => 'vers:deb/>7.0',  # we use mysql's header files for an FFI
      'pkg:deb/debian/mysqlclient-dev' => 'vers:deb/>7.0',   # pretend that Debian's mysql header files are in a different package
    },
  },
}

I'm also hoping this to be a foundation for allowing non-cpan software to state any requirements they have for components published on CPAN, and maybe even one day make it easier for packagers (the folks that re-package CPAN dists into .deb or .rpm or other package archives) have an easier time figuring out how to translate and resolve dependencies across ecosystem boundaries. 😁

But for CPAN's case, I'm thinking support for purls starts with CPAN::Meta::Requirements?

I'm not entirely sure what's the best way to go about this, but since @giterlizzi recently added support for the 'vers' schema in URI::PackageURL, I'm thinking that's a place to start looking.

Should that module be made smaller/leaner? Are there other requirements (eg. around governance) that need to be fulfilled?

What needs be in place for a feature like this to be added to CPAN::Meta::Requirements?

(edit: added some more examples and clarifications)

@Tux
Copy link

Tux commented May 14, 2024

pkg:deb quickly excludes all distributions that use rpm. How would you control the different names distributors release their packages under? e.g. openSUSE releases xz as xz + xz-devel and they don't ship MySQL anymore, but only MariaDB as mariadb-client.
I don't think that Debian-like OS users care at all about the other camp and vice versa, but claiming dependencies like that is a very easy way to stir up the flame wars.

FWIW many CPAN dist that use MariaDB or MySQL do not really care if either DB is their backend, as long as their tool is supported and the DBD works.

@sjn
Copy link
Author

sjn commented May 14, 2024

For details about the spec, see the Package URL specification and the overview of PURL Types.

There you'll see that cpan, deb and rpm aren't the only types, and that each type has an optional namespace. For deb, the namespaces may include any of the OSes that use debian packages as their package system (e.g. Debian, Ubuntu, PureOS, or Mint). A package in Mint would be referred to as pkg:deb/mint/libfoo-dev for example, and likewise the "same" package might be referred to as pkg:deb/ubuntu/libfoo2-dev in Ubuntu.

As for having multiple alternative PURLs referring to the same (or equivalent) dependencies, that shouldn't be a problem? Just list all of them? If one resolves a package URL on a Debian system { "pkg:deb/debian/libfoo-dev" => "vers:deb/2.0" } then PURLs referring to other OS/namespaces or package types can be safely ignored.

With that said, I guess there may be a need for the option to specify multiple equivalent requirements, so that in the case of having a system where both mysql and mariadb is available, only one of them is actually installed?

Unsure of how that should be specified, and unsure if this is in-scope for CPAN::Meta::Requirements...

@Leont
Copy link
Member

Leont commented May 14, 2024

I don't think completely changing the meaning of those fields at this stage is a workable solution. There are too many things that have interpreted them as package names for too long.

As for having multiple alternative PURLs referring to the same (or equivalent) dependencies, that shouldn't be a problem? Just list all of them? If one resolves a package URL on a Debian system { "pkg:deb/debian/libfoo-dev" => "vers:deb/2.0" } then PURLs referring to other OS/namespaces or package types can be safely ignored.

This is not what PURLs are designed to do, and the problem inherently requires something more fuzzy. It needs to resolve 'libfoo-dev' or even 'libfoo' to whatever package is appropriate on that platform.

Unsure of how that should be specified, and unsure if this is in-scope for CPAN::Meta::Requirements...

IMO it falls well outside the scope of C::M::R

@sjn
Copy link
Author

sjn commented May 15, 2024

I don't think completely changing the meaning of those fields at this stage is a workable solution. There are too many things that have interpreted them as package names for too long.

Ah, apologies for not being clear about my intentions. I'm absolutely not proposing to "completely change the meaning" of those fields. I'm proposing an addition. The old ways should of course continue to work as always (anything else would be reckless). 😅

This is not what PURLs are designed to do, and the problem inherently requires something more fuzzy. It needs to resolve 'libfoo-dev' or even 'libfoo' to whatever package is appropriate on that platform.

I'm unsure what you mean here. Could you expand with your reasoning behind what you're saying?

My understanding of PURLs is that they are designed for the purpose of identifying specific packages within a package ecosystem+namespace. If two ecosystems refers to the same code using different names, then PURLs exists to help make this possible in a standardized and way that is ecosystem-independent. Is your understanding different somehow?

(edit: I've added a few examples to the OP)

@Leont
Copy link
Member

Leont commented May 15, 2024

. I'm absolutely not proposing to "completely change the meaning" of those fields. I'm proposing an addition. The old ways should of course continue to work as always (anything else would be reckless). 😅

And everything that handles this field now having two meanings. That's going to cause a lot of breakage.

My understanding of PURLs is that they are designed for the purpose of identifying specific packages within a package ecosystem+namespace.

There are too many ecosystems, listing packages this explicitly isn't workable. What we actually need to do is map things repology style, probably even by using repology.

@sjn
Copy link
Author

sjn commented May 16, 2024

Appreciate you're spending some calories on this issue! 😁

I'd still love to hear your reasoning behind your "This is not what PURLs are designed [...]" statement, though!

I'm absolutely not proposing to "completely change the meaning" of those fields. I'm proposing an addition. The old ways should of course continue to work as always (anything else would be reckless). 😅

And everything that handles this field now having two meanings. That's going to cause a lot of breakage.

Ok, how? Can you come with an example of breakage?

I can imagine that some types of feature guards can help downstream tooling (that haven't come around to support PackageURLs) can use to continue working unaffected, even if CPAN::Meta::Requirements should be upgraded behind the scenes. And that if the tooling eventually does something with this new feature in C::M::R, they just make sure that they state their minimum required version accordingly, as always. Isn't that enough to allow an upgrade path that doesn't break any tooling downstream?

My understanding of PURLs is that they are designed for the purpose of identifying specific packages within a package ecosystem+namespace.

There are too many ecosystems, listing packages this explicitly isn't workable. What we actually need to do is map things repology style, probably even by using repology.

Well, sure. The amount of ecosystems out there are many, and it's not polite to ask a developer to list all combinations of ecosystems and package names in their Makefile.PL.

Luckily, we don't have to solve this problem right here and right now. I guess it's feasible to optionally make use of some repology-based PURL translation service (or library, if the matrix isn't too large) to map between package names in different ecosystems. Not sure how this should be done, but I can't imagine this is too difficult. It would certainly be a welcome convenience.

In the meantime, just stating the most common type+namespace+packagename combinations, would make a big positive difference! And if some ecosystem users feel left out, there's always the option to offer a PR to add the missing one. 🙂

Also, I think it's worth noting that the issue you are pointing out really isn't an argument against the introduction of PackageURls. There's still a need to be able to specify out-of-ecosystem dependencies – which we for any practical purposes DO NOT support at all right now – certainly not in an ecosystem-agnostic standardized manner!

With that said, PackageURLs are not the only available options for uniquely identifying dependencies across ecosystems. The problem is that the other options right now are really bad. We could do something with SWID tags, but they are horrible and require a centralized index. We could also use CPE's (Common Platform Enumeration) which is also horrible, or we could use OmniBOR, which is some proprietary horror not worth touching.

If you want some reading material on this topic, check out the highly relevant Software Identifying Ecosystem Option Analysis by CISA, published October 2023. They cover the problem domain of unique software identifies quite well.

The only promising option IMO is PackageURLs, and while they by themselves don't solve the translation problem that you point out (and that repology attempts to solve), they are still the best option when one wants to specify requirements across ecosystem boundraries. The act of resolving these requirements, I think is a solvable problem, and an interesting discussion in itself, but probably suited for another forum?

I guess that may be something to learn from repology when it comes to identifying what packages are called in the different ecosystems, but for the discussion we're having here, this isn't relevant. My proposal is for introducing PackageURL support in CPAN::Meta::Requirements, so that downstream consumers of this module can eventually implement support for the goodies this enables – with our without the help from repology. If this feature isn't added in this module, we can be certain nothing happens downstream.

I'm sure that if someone puts together a concept that allows us to do the same, but only using repology, then that's definitely worth consideration!

But WRT this ticket, I don't think this should be a blocker. 😸

@neilb
Copy link

neilb commented May 16, 2024

I think you're suggesting adding PURL support to CPAN::Meta::Spec, so that META.{json,yml} support something like the format you suggested.

The key problem, which Leon mentioned, is the devolved / distributed nature of the CPAN ecosystem. While we have lots of standard modules for processing metadata, there are lots of things out there where people have created systems which process metadata, and we've no way of knowing where breakage would happen. I've got a bunch of tools I've written over the years, and I know some of them would break. But that's just me. More worrying is that key parts of the ecosystem might break.

So any support for PURLs would have to be alongside, rather than changing the existing core mechanisms, I think.

Perhaps the place to start would be outlining the concrete benefits that CPAN authors and users, and the ecosystem maintainers, would get from PURL support, and making the case for it being worth the upheaval, and then a path to making that happen.

I don't mean on this ticket, I mean elsewhere ;-)

@Leont
Copy link
Member

Leont commented May 16, 2024

So any support for PURLs would have to be alongside, rather than changing the existing core mechanisms, I think.

Yeah, this. I'm not arguing against the goal at all, but I do think it needs to be a separate field.

@sjn
Copy link
Author

sjn commented May 16, 2024

I think you're suggesting adding PURL support to CPAN::Meta::Spec, so that META.{json,yml} support something like the format you suggested.

Are you thinking of my comment to Perl-Toolchain-Gang/CPAN-Meta#79? That comment is starting to show it's age! 😅

If you think that's a better place to have this conversation, I'm happy to move it there. But wouldn't the implementation happen in CPAN::Meta::Requirements in any case? (not sure, so I'm happy to be corrected)

The key problem, which Leon mentioned, is the devolved / distributed nature of the CPAN ecosystem. While we have lots of standard modules for processing metadata, there are lots of things out there where people have created systems which process metadata, and we've no way of knowing where breakage would happen. I've got a bunch of tools I've written over the years, and I know some of them would break. But that's just me. More worrying is that key parts of the ecosystem might break.

Could you share an example where breakage will happen? (I asked Leon the same).

I guess that if someone decides to make use of this feature, they might add a requirement in the form of { 'pkg:cpan/Library::Foo' => 'vers:cpan/>=1.208|<=2.206' }, and the first time some tooling encounters this, would create some trouble in code that mucks around in this data-structure directly...

I'm not claiming that things won't break - and certainly not for the situation where someone rolls with their own parser instead of using some Toolchain-Gang supplied module for doing this.

I guess under "normal" circumstances we could just rest on an implementation that follows the Liskov Substitution Principle, but with that being unlikely, wouldn't it still be feasible to reduce the size of a fix from

do_something( $module_name, $version_range );

to...

do_something( normalize_name($module_name), normalize_version_range($version_range) );

...?

I'm thinking that since URI::PackageURL is nearing a usable state for this, we now have a deterministic way to translate between purls and module names or dist names (depending on how the purl is written), and back.

And since purls are just another way of writing module or dist names, I think it's meaningful to prioritize preserving the semantics (meaning, preserve the structure & meaning of how to specify requirements) instead of creating a separate parallel way of specifying dependencies. My intuition is that the amount of code to handle the first is less than to handle the second way.

I'm struggling to see any "upheaval" here, so I'd still love to see examples of what you speak of...

Perhaps the place to start would be outlining the concrete benefits that CPAN authors and users, and the ecosystem maintainers, would get from PURL support, and making the case for it being worth the upheaval, and then a path to making that happen.

I think I can put together something, but I'm wary of having this ticket depend on some community consensus of sorts. I believe the examples I gave in the OT should illustrate the main benefits well enough for anyone who cares about this topic and module, and if they are not, please tell me what is missing!

I've purposefully not mentioned downstream benefits like "Easier generation of SBOMs that can be reused downstream without custom modification of component names" or "The possibility for non-CPAN software to specify dependencies in an ecosystem-agnostic way to software found on CPAN" or "A standardized ecosystem-agnostic way to refer to CPAN components that have known vulnerabilities" (e.g. PURL use in OpenVEX), or... Well, you get the point. Even a quick read of the CISA Analysis I linked to earlier, would help paint a picture of what's at stake, and how PURL support in ecosystems play into this as a solution.

PURLs are still a "new" thing, and there's definitively a need for sharing info about it's uses and benefits, though if there are downstream users that may be affected, do you really think they'll even read any blog post about this, let alone share any thoughts on the matter?

They'll definitely learn of PURLs if they have some code that breaks, though – and if the fix is trivial (e.g. like above), then that's a more reliable way to both get the word out, and to make things happen...

@neilb
Copy link

neilb commented May 17, 2024

If you think [the issues list for CPAN::Meta::Spec is] a better place to have this conversation, I'm happy to move it there

The issues list for any distribution isn't the right place for this. You're proposing a major change to the underpinning of the CPAN toolchain, but then jumping down into the details. I think you need to back up and whether it's a blog post or a document somewhere, start with:

  • Problem statement: what's the problem with the current CPAN toolchain / ecosystem that you want to fix? And who is it a problem for: authors, users, toolchain developers?
  • Why you think PURLs are the right way to address the problem, and how the PURL-based solution will help the stakeholders above.
  • A system-level view of the CPAN ecosystem, and where PURLs fit into this
  • A list of concrete and specific changes to add this support, without breaking anything currently in place
  • How this could be achieved in incremental changes, rather than a big bang.

You're not getting people leaping to help you on this, because you haven't sold people a vision. That's where you need to start.

It may be that you shared some of this at the PTS recently ...

I'm struggling to see any "upheaval" here

If you can't see why people are nervous about mucking with the metadata that underpins so much of the CPAN ecosystem, then that just reinforces the need for the above piece of work first. It may be that the final set of changes would be relatively small, but it's going to take a lot of time, effort, and people's buy-in, to get to there.

@Leont
Copy link
Member

Leont commented May 18, 2024

But wouldn't the implementation happen in CPAN::Meta::Requirements in any case? (not sure, so I'm happy to be corrected)

Why would it? CMR is a mapping of modules to version requirements, nothing more nothing less. Actually handling version requirements is like 90% of it really.

Could you share an example where breakage will happen? (I asked Leon the same).

Lots of tools assume the keys are module names, if that's suddenly no longer true they'll get mightily confused. It's a variety of things like cpan clients, testing, authoring, packaging, etc.

It seems like a far better idea to make this a separate field.

@sjn
Copy link
Author

sjn commented May 18, 2024

The issues list for any distribution isn't the right place for this. You're proposing a major change to the underpinning of the CPAN toolchain, but then jumping down into the details. I think you need to back up and whether it's a blog post or a document somewhere, start with:

  • Problem statement: what's the problem with the current CPAN toolchain / ecosystem that you want to fix? And who is it a problem for: authors, users, toolchain developers?
  • Why you think PURLs are the right way to address the problem, and how the PURL-based solution will help the stakeholders above.
  • A system-level view of the CPAN ecosystem, and where PURLs fit into this
  • A list of concrete and specific changes to add this support, without breaking anything currently in place
  • How this could be achieved in incremental changes, rather than a big bang.

Sure; Though I'm wary about making this into some public discussion, I guess I can put something together. I won't be able to do all of these, because some of the points you ask for do belong in an issue tracker, and I think it would be a waste of my time to figure out the details when there others who know the implementation could do the same with a 1/10 of the effort.

I'll see what I can do.

You're not getting people leaping to help you on this, because you haven't sold people a vision. That's where you need to start.

Aah, well. The topic of PackageURLs have actually been a recurring theme both at PTS and in the CPANSec channel, though I guess there are many here who haven't been following those places. :-|

It may be that you shared some of this at the PTS recently ...

Yes.

I'm struggling to see any "upheaval" here

If you can't see why people are nervous about mucking with the metadata that underpins so much of the CPAN ecosystem, then that just reinforces the need for the above piece of work first. It may be that the final set of changes would be relatively small, but it's going to take a lot of time, effort, and people's buy-in, to get to there.

That's why I'm also asking for examples. To put together a relevant case, I need to know the needs and concerns of the target audience (you), and that is why I (repeatedly!) ask for examples.

Please show me examples. Don't tell me that you know stuff. Show me examples. (And of course, I'll be happy to offer my thanks when you do! But offering thanks before something is done, is putting the cart before the horse, I'm told. 😉 )

@sjn
Copy link
Author

sjn commented May 18, 2024

For those of you who are unfamiliar with PackageURLs and want to get a quick introduction while you wait for me to write something CPAN-specific, check out https://archive.fosdem.org/2022/schedule/event/package_url_and_version_range_spec/ 🙂

@Leont
Copy link
Member

Leont commented May 19, 2024

That's why I'm also asking for examples. To put together a relevant case, I need to know the needs and concerns of the target audience (you), and that is why I (repeatedly!) ask for examples.

Please show me examples. Don't tell me that you know stuff. Show me examples. (And of course, I'll be happy to offer my thanks when you do! But offering thanks before something is done, is putting the cart before the horse, I'm told. 😉 )

The primary problem is that any currently existing CPAN client would fail to resolve module pkg:cpan/CPAN::Meta::Requirements, and die. What you're proposing is a "breaks the world" kind of change for cpan clients, that is entirely incompatible with the backwards compatibility guarantees that we give.

@sjn
Copy link
Author

sjn commented May 19, 2024

Please show me examples. Don't tell me that you know stuff. Show me examples.

The primary problem is that any currently existing CPAN client would fail to resolve module pkg:cpan/CPAN::Meta::Requirements, and die. What you're proposing is a "breaks the world" kind of change for cpan clients, that is entirely incompatible with the backwards compatibility guarantees that we give.

Ok, so what are you asserting here? Is it that the specific requirement to the requirements-parsing module (when using the new syntax) needs special-casing during bootstrap? (an example to support your assertion would be useful).

Or do you mean that any PackageURL would fail to resolve, no matter what is implemented in CPAN::Meta::Requirements?

In the bootstrapping case, I guess that's something that needs to be taken into account in CPAN.pm and other build tooling, and I see after a cursory glance that there already exists code for doing something similar at least in CPAN.pm... So yes, I see that point, though I guess it's possible to add features to this dist without losing feature compatibility with older releases?

Another option could be to immediately introduce some appropriate signal (warning, error, whatever) to CPAN::Meta::Requirements, that communicates something useful when an unknown module name is encountered. I see there's already some code to this effect for version ranges, so doing something similar for module names wouldn't seem like a huge step, I think...

In the second case, wouldn't it be enough for a (non-bootstrapping) CPAN dist that decides to use purls to specify minimum version requirements for the tooling that is used during the distribution's configure or build phase? e.g. add configure => { requires => { CPAN::Meta::Requirements => XXX }} # purl support needed to it's prereqs hash...

@haarg
Copy link
Member

haarg commented May 19, 2024

The current way package URLs are specified for CPAN is IMO broken, and I wouldn't want to integrate it into any part of our toolchain.

The Package URL spec is rather incomplete at the moment. It doesn't include anything about what the semantics of its URLs are. But it does imply some semantics. It states:

A purl or package URL is an attempt to standardize existing approaches to reliably identify and locate software packages.

This implies that a Package URL is meant to be an identifier for a software package, which would mean some redistributable software package. In terms of CPAN, this would have to be a release tarball.

A namespace is defined as:

some name prefix such as a Maven groupid, a Docker image owner, a GitHub user or organization.

So the namespace is something the name exists within. This mostly maps to the CPAN author.

Package URLs for CPAN use pkg:cpan/author/distribution. This is basically fine, although it doesn't match how dependencies are specified inside CPAN dists. This form makes version ranges mostly useless. They could be useful if the author was removed and only the distribution was used, but that isn't possible because:

CPAN purls also support pkg:cpan/module. This is basically mapping an entirely different dependency type into the same "type", by using the namespace for something it was not meant for. In this form, the package URL no longer represents an identifier of a software package. Instead it's just a way to find a package. This means it doesn't make much sense for something like SBOM.

So now there are two different types of purl sharing the same type, useful for mostly distinct purposes.

As mentioned by others, specifying external dependencies is also problematic. It's not useful to have to include dependencies on every variant of a library from every packaging system.

Package URLs as specified seem more designed for something like SBOM. I do not think they are fit for the purpose of specifying dependencies.

@Leont
Copy link
Member

Leont commented May 19, 2024

Ok, so what are you asserting here? Is it that the specific requirement to the requirements-parsing module (when using the new syntax) needs special-casing during bootstrap? (an example to support your assertion would be useful).

It's not a parsing issue, it's a semantics issue, current CPAN clients do not know how to interpret such values. I have no idea what misconception you have that makes you think this could work.

In the second case, wouldn't it be enough for a (non-bootstrapping) CPAN dist that decides to use purls to specify minimum version requirements for the tooling that is used during the distribution's configure or build phase? e.g. add configure => { requires => { CPAN::Meta::Requirements => XXX }} # purl support needed to it's prereqs hash...

No it wouldn't. That is too late, and at least in case of cpanm ineffective anyway (it uses a bundled CMR).

@sjn
Copy link
Author

sjn commented May 19, 2024

Package URLs for CPAN use pkg:cpan/author/distribution. This is basically fine, although it doesn't match how dependencies are specified inside CPAN dists. This form makes version ranges mostly useless. They could be useful if the author was removed and only the distribution was used, but that isn't possible because:

I think we already covered this topic in a discussion on CPANSec IRC, but I guess it's also worth repeating here for posterity. :-)

PackageURLs by themselves are only suited for referring to resolved dependencies. This means, they are useful for lockfiles, installation reports, SBOMs (as you say) or other situations where you either want to know exactly what was installed and where it came from, or when you want to reproduce a build.

In these cases, the pkg:cpan/author/distribution@version form is what is expected to be used.

If the tooling understands full distnames (e.g. HAARG/Moo-2.001000.tar.gz) as a prerequisite, then one could of course use the equivalent packageurl as an alternative way of stating it.

This means that HAARG/Moo-2001000.tar.gz is functionally equivalent to pkg:cpan/HAARG/Moo@2001000, and a roundtrip conversion between these syntaxes should be trivial. (Note, there are exceptions, of course, e.g. when someone publishes a .zip file instead of a tarball, or when the dist refers to a package not hosted on www.cpan.org, but we can skip this discussion for now.)

In the case above, where the tooling understands prereqs in the for of HAARG/Moo-2001000.tar.gz, then I guess these are expected to be resolved to exactly the same dist package. This is basically pinning a dependency to a specific dist release.

But the common case is that prerequirements include some version constraints, and to manage this with PackageURLs, they must be accompanied by a "vers" version range url.

CPAN purls also support pkg:cpan/module. This is basically mapping an entirely different dependency type into the same "type", by using the namespace for something it was not meant for. In this form, the package URL no longer represents an identifier of a software package. Instead it's just a way to find a package. This means it doesn't make much sense for something like SBOM.

The "different dependency type" you speak of, is that the first one (pkg:cpan/author/distribution) is meant for resolved dependencies, and the second one ("pkg:cpan/module" => "vers:cpan/1.0") is meant for unresolved dependencies, and accompanied with it's version constraints.

The second form can be useful in an SBOM in the sense that we can make it possible to refer to new types of dependencies in a standard manner. I gave a few of the as examples above, but with a little imagination I think we all can come up with examples that can improve dependency resolution across not just CPAN but in general.

The current way package URLs are specified for CPAN is IMO broken, [...]

Now, whether or not the purl spec is actually "broken" in it's current form, that's a really good conversation to explore either in an issue in URI::PackageURL, or in the purl-spec repo. Could you formulate a test case where the current syntax breaks down?

The purl-spec is currently undergoing "cleanup" as part of a standardization process in ECMA's Technical Committee 54 (agendas & minutes mentioning purl), so any concerns with substance that you have, are very timely to raise right now.

@haarg
Copy link
Member

haarg commented May 19, 2024

PackageURLs by themselves are only suited for referring to resolved dependencies. This means, they are useful for lockfiles, installation reports, SBOMs (as you say) or other situations where you either want to know exactly what was installed and where it came from, or when you want to reproduce a build.

Yes, this is all fine with the pkg:cpan/author/distribution form.

But the common case is that prerequirements include some version constraints, and to manage this with PackageURLs, they must be accompanied by a "vers" version range url.

Version constraints don't work with the pkg:cpan/author/distribution form because the author/namespace can change per release. And leaving out the author and only including the distribution doesn't work, because that space is taken by the module form.

The "different dependency type" you speak of, is that the first one (pkg:cpan/author/distribution) is meant for resolved dependencies, and the second one ("pkg:cpan/module" => "vers:cpan/1.0") is meant for unresolved dependencies, and accompanied with it's version constraints.

So you agree that you are stuffing two distinct types into one.

Could you formulate a test case where the current syntax breaks down?

The problem is the concept, not the syntax.

@sjn
Copy link
Author

sjn commented May 19, 2024

It's not a parsing issue, it's a semantics issue, current CPAN clients do not know how to interpret such values. I have no idea what misconception you have that makes you think this could work.

Oh, don't have a conception that PURLs will be working immediately and out of the box just like that. No worries about that!

What I do have a conception of, is that if this tooling is ever going to implement support for PackageURLs, then it's important that any underlying modules do the right thing before when it's time to implement it in the tooling. So I'm looking for modules with "separate concerns" like like this one, and see if it's possible to make something happen here.

In the second case, wouldn't it be enough for a (non-bootstrapping) CPAN dist that decides to use purls to specify minimum version requirements for the tooling that is used during the distribution's configure or build phase? e.g. add configure => { requires => { CPAN::Meta::Requirements => XXX }} # purl support needed to it's prereqs hash...

No it wouldn't. That is too late, and at least in case of cpanm ineffective anyway (it uses a bundled CMR).

Too late under which circumstances? How?

(as for cpanm, let's just limit ourselves a little and declare that it is out-of-scope for this discussion for now.)

@sjn
Copy link
Author

sjn commented May 20, 2024

Version constraints don't work with the pkg:cpan/author/distribution form because the author/namespace can change per release. And leaving out the author and only including the distribution doesn't work, because that space is taken by the module form.

Ok, sure, though not entirely correct (as things are now, leaving out the namespace means you're writing a module name, with the corresponding naming limitations). If there is a use case here that is important, then there's still time to update the spec The current form came out of the discussion in this ticket, and while the proposed changes are were merged into purl-spec in February, I'm optimistic that if there are some real concerns, they can be addressed.

Would you mind adding your thoughts, accompanied with an illustrative example to that ticket? 😉

The "different dependency type" you speak of, is that the first one (pkg:cpan/author/distribution) is meant for resolved dependencies, and the second one ("pkg:cpan/module" => "vers:cpan/1.0") is meant for unresolved dependencies, and accompanied with it's version constraints.

So you agree that you are stuffing two distinct types into one.

Hehe. "stuffing". Love the seriousness. 😁

Yes, there are already two distinct use cases + corresponding syntaxes that need to be covered when referring to packages on CPAN – 1) module+version prereqs, and 2) their resolved distribution names – and since there are two distinct ways to represent these, and they are already in use throughout CPAN, then what's your problem with using two "variants" of a new syntax to represent the same in PURLs?

It almost seems like you're just arguing for the lols here, by asking for a fix to a fundamental design misfeature that was created more that two decades ago...

Could you formulate a test case where the current syntax breaks down?

The problem is the concept, not the syntax.

The concept mirrors reality as it is on CPAN right now. If you can think of a way to represent the necessary nuances with a "cleaner" concept, then please share! I've spent some time thinking of alternatives (some of you can read in that issue I linked above), and I'd love to see an improvement to this.

@Leont
Copy link
Member

Leont commented May 20, 2024

So I'm looking for modules with "separate concerns" like like this one, and see if it's possible to make something happen here.

I have no idea what you mean by this, and I suspect it may be at the core of our lack of communication.

(as for cpanm, let's just limit ourselves a little and declare that it is out-of-scope for this discussion for now.)

This confuses the hell out of me. How can it be out of scope?

@sjn
Copy link
Author

sjn commented Dec 6, 2024

I've taken what I've learned in this discussion, and applied it to a ticket in the purl-spec repo: package-url/purl-spec#362

Please feel free to offer any corrections or additions there. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants