-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
README: Debian: extend to specify source packages #57
Conversation
Please don't merge yet - @Silvanoc reminded me about architecture-independent packages, so falling back to source if "arch" is missing is likely a bad idea. I'll update the MR soon. |
New version pushed, now suggesting a "type" qualifier. From my side ready for merging. :-) |
@sschuberth Could you review this please? |
I'm not a Debian guy, but the proposed changes look sensible to me. |
@lamby would you mind taking a look at this PR as a Debian expert? |
On a related note, here's a nice post from the Stack Overflow newsletter just about that. |
Some quick notes:
Regarding arch dep / arch-indep / source, did you consider having a (Personally, I'm not a fan of a URL returning 200 and then modifying the querystring can make it 404, eg. by changing the |
What I meant saying that "names differ" is that you can't guess from a binary package to a source package by just replacing "arch=amd64" with "arch=source", to use your arch-key suggestion. So you can have "pkg:deb/debian/[email protected]?arch=amd64" while there can't be "pkg:deb/debian/[email protected]?arch=source" - this would be "pkg:deb/debian/[email protected]?arch=source" instead. Regarding version deviations between source and binary packages, this for sure happens. One prominent example is the "+bN" suffix which is appended for a rebuild of a new binary package from unmodified sources, see https://packages.debian.org/buster/libselinux1: binary version 2.8-1+b1 is built from source version 2.8-1. I also included a "+b1" example in the examples section in my commit. In some other cases, epoch prefixes ("1:" & friends) differ between source and binary packages. And then, there are even such weird things like lvm2 package where the binary version "2:1.02.155-3" is built from source version "2.03.02-3", see https://packages.debian.org/buster/libdevmapper1.02.1. Regarding "arch=source" versus "type=source", I'm more or less undetermined. In fact, I even planned to propose "arch=source" at some point, but refrained because it simply sounded wrong as "source" is no "architecture" per definitionem. :) |
TIL :) |
@iamwillbar @lamby @sschuberth So do you have an opinion about my PR, anything I can improve so it can be merged? Shall I reword it to switch from "type=source" qualifier to (mis)using the "arch=" qualifier as suggested by @lamby? (I personally would prefer something other than "arch=", but I'm not too opinionated here... ;) ). |
Please don't await or otherwise block on my feedback as I'm afraid I won't be able to commit spending a lot of time on this PR :) |
I'd lean towards misusing arch because it's mutually exclusive and this avoids people doing weird things like Thanks for the feedback @lamby, very helpful. |
In certain places of APT world, |
@bureado that's a really good point, are the semantic differences of a source package vs a binary package significant enough to warrant a different schema, and I think quite possibly. @pombredanne would you be opposed to differentiating source and binary packages this way? |
We already have precedence for specifying source packages using ecosystem specific terms. For example
The above would resolve the source jar rather than the binary. Introducing ecosystem types specifically for different types of artifacts would greatly complicate things and introduce incompatibilities with existing systems. |
Same here. |
On a somewhat related note, ClearlyDefined does explicitly distinguish the
I'm also against introducing incompatibilities here, but I wonder whether the Java / JAR case is special: The structure of JARs is no different whether it contains bytecode or sourcecode, in both cases the JARs are basically just ZIPs. But I don't know if that's also true for |
They are not the same file format. Binary packages are the usual Some thoughts with this caveat: I just recently learned about purl from @iamwillbar, and as exciting as all of this looks, I'm still catching up to it. So I'm in the process of taking a quick look at the SDKs and downstream clients to give a more educated response here, so I would hate to block anything. In the meantime... Generally, usage context hints to whether source or binary package are expected. Source packages can't be installed, so if someone is But I guess the problem at hand is that purl is context-unaware and has many more use cases. For example, if you're using purl to fetch a file/files from a repo, then the context can't be implied and you need a solution such as On the latter, perhaps purl could allow for
I guess there could also be things like resolving to homonymous packages of different types, picking the binary by default with some 301 logic, etc., all of which sounds a bit nightmarish (but is in fact the type of situations we face in real life, you asked for "nginx", did you mean "src:nginx"? Anyway, here's the nginx deb.) Just my 2e-2. |
For me, that's enough of a justification for a dedicated However, I would not specify a dedicated general The rule I'd propose is:
|
Thanks @bureado for your input here! I agree with @sschuberth codification of how we should apply this going forward (it would be great to add those rules into the specification to provide guidance to future contributors). Unless there's objection I'd suggest @gernot-h follow that scheme for this PR. |
There was literally an issue about Docker/OCI Image formats in which the opposite point was argued. See #68. The spec does not describe the package format, only how packages are located. Per the first sentence of the spec:
Considering that the source packages and all the variations of binary packages (e.g. arch) are all located the same way, it doesn't make any sense to provide two PURL types that resolve to the same location. Doing so will likely lead to a lot of changes to the current spec, and potentially to some of the PURL types that we still haven't fully investigated yet. |
I think in #68 the argument was because they were located differently they should have different schemes, I think we effectively deferred the discussion as to whether the format plays a specific role. I think in all of the previously defined types you can infer the format of the package from the scheme because there was effectively a 1:1 mapping (or more accurately *:1 mapping) of schemes to package format. In this case we now have ambiguity where the package is located the same way but the format once you locate it is different, so we should make an explicit decision (and update the spec accordingly) about how we should handle that. If the spec only cared about identity then I would pretty comfortably say that we should not differentiate scheme based on format because our primary concern is uniqueness. However, if you're trying to locate something then there's a strong possibility that you're wanting to do something with it, and in that case knowledge about what format to expect once you locate it is probably pretty important. For example, the host and path aspects of |
If you want to differentiate formats and have different PURL types for each, I'll give you a reason why this would greatly complicate things. A typical Maven repo for an artifact will consist of:
Here we have a repo containing binaries (which could also be war, apk, etc), pgp signatures, text files, and xml files for a single version of an artifact. It is expected that the consumer of the PURL know how to handle the various file formats. In the case of jar, war, apk, ear, it is also expected that the consumer know how the layout of the zips vary by format. PURL doesn't provide any of this guidance nor should it. The identification and location is currently what PURL is scoped with. Another potential impact of having different PURL types for different formats is if you're analyzing PURL using OSS Index or a similar service, you may have to make multiple requests in order to identify it. For example, if I'm using a package, but I compiled it from the source package, I would have to make a special request just for this case instead of making a single request to handle all format types. The potential for false negatives and general confusion will be elevated. |
The key question is whether the As a reasonably long time Debian user, there is an un-ambiguous relationship between these AFAIK. One Source can yield one or more Binary packages and every Debian control file that provides details about a Binary also list its Source. And the inverse is true: every Package also lists its Source. (This relationship could be made explicit in a database of sorts for some use case that want to keep these relations explicit when using multiple Package URLs to track Debian packages) And knowing the name/version of a Source or Binary is enough to identify and locate everything about a package (including going from Source to Binary(ies). By everything I mean locate every sources and binaries and control, copyright files and any other system URL that deal with packages. Based on all this, I am much in favor of @gernot-h proposed changes here and @stevespringett argument that Debian sources are not something that deserves a special The things to resolve would then be IMHO:
Either B or C seem a tad more explicit than the "type" word. And with C. there could be a possible confusion between an arch=noarch (as used in RPMs) and an arch=source. |
If I only think about Debian, I would be tempted to agree with @bureado's initial suggestion to introduce a separate type (taken that source and binary handling is quite separated in Debian). However, @pombredanne reminded me to keep the other distributions in mind - and thinking of all the numerous low and high level packaging formats used in the Linux world (plus PyPI, Ruby Gems, ...), introducing additional types will load the purl spec with unnecessary details over time. I agree with @pombredanne that in the end, source and binary packages are always closely related - and an application which handles artifacts of a distribution (i.e. purl type) will know how to handle its format. Regarding your final questions: Re 1.: I think, the default without a qualifier should be binary package for Debian & other binary distributions. This is the thing a "normal user" means when he talks about package X. And probably this is what the purl-spec defines today, so changing this would probably break semantics. Re 2.: I would love to see a "standard" qualifier to reference source packages for all the binary package formats out there. I'm not really decided regarding arch=source, type=source or classifier=source(s). "arch" sounds wrong because "source" is no "architecture", but guaranteeing mutual exclusiveness between source and binary packages sounds like a good thing to save the world a lot of confusion. At least until the first distribution arrives introducing different source packages for different binary architectures... That all said, the initial comment of @stevespringett reminded me of a totally different use case, when he wrote that "classifier=sources" "resolves" to the sources for a package. If we have a source package "foo" which creates a binary package "bar", do we want a purl of "pkg:deb/debian/bar...?classifier=source" to "resolve" to the source package foo? I don't think a purl should do this, or? |
@henning-schild, as user of a source distribution and knowing dirty details of a lot of distros, can you have a look at this discussion (see known purl types as background) - and check if we overlooked an important aspect or use case? TIA! |
+1, and @pombredanne's B or C lgtm, no strong opinion but watching the discussion. You both brought up something interesting:
and
I also noticed a That's just an observation which probably has no bearing in this PR but it relates a bit to the question of deriving and locating a source package based on binary package attributes, particularly for One last observation, a key operational difference between a |
This allows to refer to Debian source packages. Often there is no trivial mapping between source and binary packages (several binaries built from one source, names as well as versions can differ between binary and source packages!), so package-urls shall allow to explicitely specify source packages. Using the arch qualifier for this was suggested by (former DPL) Chris Lamb and William Bartholomew in package-url#57 and "avoids people doing weird things like ?arch=amd64&type=source". To stay consistent with former versions of the spec, a deb package without qualifier shall refer to a binary package. Signed-off-by: Gernot Hillier <[email protected]>
10c84f3
to
801cd75
Compare
Sorry for the long delay, but as we were not really decided whether to use @pombredanne's B or C, I hoped for some more opinions here. But hey, as we have a suggestion by a former DPL here, let's use that one. :) And I really like the mutual exclusion between arch=amd64 and arch=source as pointed out by @iamwillbar. So I reworked my commit to use qualifier "arch=source" and tried to incorporate some of our reasoning into the commit message. Looking forward to your review! |
FWIW, APT 2.0 seems to use |
Friendly ping. Is it looking like FWIW, |
Same here. I'm happy to update the PR to any decision, be it |
This PR lgtm, I'm in favor of merging to unblock more Debian + purl scenarios. The proposal is reasonable and it's not incompatible with future iterations if we learn more from how people are using it. |
This allows to refer to Debian source packages. Often there is no trivial mapping between source and binary packages (several binaries built from one source, names as well as versions can differ between binary and source packages!), so package-urls shall allow to explicitely specify source packages. Using the arch qualifier for this was suggested by (former DPL) Chris Lamb and William Bartholomew in package-url#57 and "avoids people doing weird things like ?arch=amd64&type=source". To stay consistent with former versions of the spec, a deb package without qualifier shall refer to a binary package. Signed-off-by: Gernot Hillier <[email protected]>
801cd75
to
bdb25b4
Compare
Thanks, @bureado! I just rebased accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix dpkg command syntax and refine definition Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
b68c958
to
37fd5f5
Compare
Thank you, @pombredanne, for reviewing and improving the wording! My PR is now updated with your fixes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks
This is similar to what we agreed on for Debian in package-url#57.
In technical as well as compliance contexts, we often need to refer to Debian sources for a package. Often there is no trivial mapping between source and binary packages (several binaries built from one source,
names as well as versions can differ between binary and source packages!), so to avoid confusion, package-urls shall allow to explicitely specify source packages. After some internal discussion, we
think that an extra qualifier "type" is needed - which shall prevail the "arch" qualifier.