-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How are golang sub-modules supposed to be expressed by purl? #63
Comments
A followup note: I am not sure that even without golang sub-modules that the spec is reflective for golang. https://github.com/package-url/purl-spec#known-purl-types give the example
This implies that the namespace is
But maybe I am just being argumentative here. |
I don't believe use of subpath here is appropriate, as IIUC subpath is used to point to something inside of a package Its certainly a bit wrinkly with golang modules def of repository and module though, and maybe subpath should be expanded for that use-case? Though I think similar to html anchors and urls using fragments to point to something inside of a page the same thing would apply here for purl to point to something inside of a specific package. |
Regarding the github org/user and repository bits, IIUC golang's module stuff doesn't require a module be a github url or a git repository (though it may mostly commonly be such). Does not appear that the coordinates used for golangs modules really care about? I didn't (after a very brief scan of the docs) see that the value for For git submodule package looks like the only wrinkle is if you wanted to find the tag, that you need to know the root repository location so you could then figure out what the path to the sub-module was? It may also depend on what one would do with a golang purl, seems like no matter how you spin it some translation would have to be done, but I think thats probably fine. For example a maven purl with dot notation in groupId would have to get translated to slash notation for resolving a file on disk or remote repository location. So my guess is that avoiding any front-loaded assumptions on the golang package url is probably simplest, and that your first example:
... is probably reasonable. Just my 0.02 though... i'm not a golang module expert by far ;-) |
from https://github.com/golang/go/wiki/Modules:
If the "leading v is required" then maybe the purl form is:
... though its not really clear if thats a hard requirement or not. |
I think this is an actual problem. Here's some real life examples:
1st one makes sense to me. To ensure consistency we should document how to handle submodules. Some low effort options I can think of:
|
The original intent has been to use @andrewstein with your examples:
@bradcupit with your examples:
My personal preference would be avoid overloading the namespace and name and continue to use the The rationale is that in practice a good number if not a majority of public Go modules do end up fitting this approach: there is some repo or web site (Github, Gitlab, Bitbucket) that has mostly a two-level structure: "org or owner or user"/"name of project" and that level is typically what has a common set of attributes (ownership, team, release process, licensing, etc.) and there are "subpath" that extend inside this which are things effectively imported in Go. To the best of my knowledge this ("org or owner or user"/"name of project") is also what to the Go toolchain would fetch in a workspace: the whole namespace/name would be fetched and specific subpaths would be selectively imported (I may be wrong there as I did not dive deep inside Side note: IMHO there would not be many Package URL use cases to reference a specific deeply nested piece of Go code (e.g. using a subpath as suggested here) as opposed to the whole ns/name at once. What would be yours? |
@pombredanne thank you so much for responding! tl;dr: Though the existing purl spec works, I think we've accidentally made something impossible for our users.
That proposal works with all the existing code and examples. Users can take Having said that, users (including the team I'm on) will write code that converts a Go module name and version to a purl string. This is easy for well known repos like github and bitbucket, but difficult for custom module names. Here's a real-world example:
Where does the parent module end and the submodule begin? What's the namespace and what's the name? We can't tell the answer to either without analyzing the Go module's git repo. If users write their own code to do this should they set namespace = Ultimately we can only guide our users and the onus is on them to split things up correctly. Ideatl;dr just README changes, no code changes, but we percent-encode a lot more Perhaps we should consider changing the README examples so they don't use namespace and instead only use name? And since we can't always tell where the parent module ends and the submodule begins we could also treat submodules the same as module names, instead of like subpaths. These two suggestions make it much easier for users to set the right values for namespace (which would always be blank now) and name, and then get consistent purl strings as the output. The downside: names are percent encoded, so the README purl strings would change. Examples:
I can't think of any other way to make these two problems easier on users. |
@bradcupit I agree with your proposal — for go, there is not “namespace/name” concept. And if one is to drag submodules into the mix, there is no way to know, just looking at the import path, where the module ends and the submodule begins. Treating the whole thing as a single name is the only way as far as I can see. |
In @bradcupit propasal, would using For instance, if a purl should point to In other words, should the approach be valid for both subpackages and submodules? |
Please consider also readability and auditability of the PURL. From the usability perspective is |
@athos-ribeiro said
Sorry for the late reply! The only reason we're percent encoding the @gotthardp said:
Yeah, I personally hated what's in my suggestion. I very much prefer the version that's easier to read, meaning, the one without percent encoding, but I don't think the pretty version is realistic.
That makes sense, and from the perspective of writing the purl-spec it makes sense too, but I think we have to consider how people are going to use the purl-spec. People will have the 'coordinates' of a package and want to convert that into a purl string. For maven the coordinates are the groupId, artifactId, and version, which is enough to compute a purl string. Or, we just do the simple thing: require only the repo URL and stuff it all in the name field and percent encode it. |
|
@jdillon told me how the namespace encoding works (namespaces can contain slashes and we only encode what's between the slashes) -- plus I was totally wrong, we're proposing ditching the namespace for Go, so please ignore the previous comment. |
He also mentioned it wasn't clear what this issue is proposing, so here's the shorter version of what @andrewstein proposed (and what I echo): Problem Proposal
Example
|
Another consideration could be remove entirely the notion of namespace and merge ns and name in a name component where you can have as many segments as you like. It could be made such that this is backward compatible for every package type. |
yes @pombredanne ! |
Just as a snapshot how tools handle that today (for the example https://pkg.go.dev/github.com/russross/blackfriday/v2 in version v2.1.0):
|
I think it is not clear from the text in the README and the examples whether a `/` in the namespace needs to be escaped. This adds an example from `/test-suite-data.json` to the list of examples, to make that clear, that escaping should not be done. A follow-up question, that is not scope of this PR, would be: ============================================================= given the PURL `pkg:swift/github.com%2FAlamofire/[email protected]` is `pkg:swift/github.com/Alamofire/[email protected]` its canonical form? If yes, it should be added to the test cases. Or, for having more fun and with looking at package-url#63 : what is the canonical form of `pkg:golang/github.com%2Frussross%2Fblackfriday%[email protected]`? Signed-off-by: Maximilian Huber <[email protected]>
Just wanted to mention I've written a bunch of special-casing code for golang this week to try to parse the namespace. The difficulty lies in guessing the number of slashes in a namespace, e.g.
And there are even examples of go modules that dont have a namespace, e.g. Knowing the number of slashes is important so you can split on them, and guess which part is the namespace, name, or subpath, but it's nearly impossible to do for go. For instance, it's not clear which of these cases is
So splitting on slashes or even having prior knowledge of a VCS host is not really enough to make out the namespace vs the name. Given that, I agree with @bradcupit to squash the idea of a namespace for golang. |
I think the only possible answer is 1: 2: 3: 4: 5: On the topic of |
You are correct, but the situation is not ideal. This may be more of a problem of Go than purl, but many users would be surprised to find a purl library making external network calls while creating a purl string. It also wouldn't work in an air-gapped environment, and may cause performance/scale issues when processing hundreds of thousands of requests. |
@matt-phylum thanks for outlining the different options available. I see that you have raised a case for option Using the same module example here are what the different options
|
I agree that with option 5 it's nice that the PURL name and the Go module name are the same, but I'm not sure it's worth the escaping to make that happen, and it would be the only package type to commonly contain NPM has an optional namespace (scope), which is a critical part of the package name if present. In GitHub has a required namespace (owner), which is a critical part of the package name. In Maven has a required namespace (group), which is a critical part of the package name, but what is written in PURL as I think Swift has the exact same issue as Go here. The PURL spec gives examples like It'd be nice if PURL didn't differentiate namespace from name since it seems like every package type will either have no namespace or it will have a unique definition for what the namespace means and how it must be used (often they are prefixed to the name, but some seem redundant with the |
Have also enountered the Maven issue mentioned, and agree that a workaround was needed to meaningfully parse the purl. I think its a symptom of:
Which leads to https://github.com/package-url/purl-spec#problem It seems like there are two options (referenced as
|
ORT isn't making it "unnecessarily difficult". As mentioned here already:
Esp. if you squash the namespace into the name (which is what ORT actually already does; we use an empty namespace in ORT's own data model for Go), you'll most likely end up with |
I agree. |
I don't have a problem with making breaking changes in ORT if needed to fix PURL correctness. Internally, ORT is using its own package ids anyway. However, as it stands I still believe ORT is doing it right (WRT the PURL specs), and Syft and SCTK are doing it wrong. |
I think it is not clear from the text in the README and the examples whether a `/` in the namespace needs to be escaped. This adds an example from `/test-suite-data.json` to the list of examples, to make that clear, that escaping should not be done. A follow-up question, that is not scope of this PR, would be: ============================================================= given the PURL `pkg:swift/github.com%2FAlamofire/[email protected]` is `pkg:swift/github.com/Alamofire/[email protected]` its canonical form? If yes, it should be added to the test cases. Or, for having more fun and with looking at package-url#63 : what is the canonical form of `pkg:golang/github.com%2Frussross%2Fblackfriday%[email protected]`? Signed-off-by: Maximilian Huber <[email protected]>
I am confused reading the spec for purl in relation to golang sub-modules.
For example, looking at the submodule expressed in this
go.mod
file: https://github.com/go-modules-by-example/submodules/blob/master/a/go.mod, released by thea/v1.0.0
tag: https://github.com/go-modules-by-example/submodules/releasesIs the purl:
It basically comes down to what is the namespace (if any), what is the name and what is the sub-path (if any) for this submodule.
The text was updated successfully, but these errors were encountered: