Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: update java cataloger to include similar child packages, correct PURL, and correct GroupID #1956

Closed
wants to merge 8 commits into from

Conversation

spiffcs
Copy link
Contributor

@spiffcs spiffcs commented Jul 25, 2023

Summary

This PR aims to improve 3 related aspects of syft's java cataloging.

  1. Improve PURL generation. In some cases syft is appending the artifact ID to the middle section (GroupID) of a PURL
    a. EX: pkg:maven/org.apache.xalan/[email protected] should be pkg:maven/org.apache/[email protected]

  2. Improve GroupID detection. Currently syft does not use any hierarchy for GroupID detection and treats all sources as equal. It does already treat fields from files with priority. We should try to obtain a GroupID answer from Pom Properties, then the Pom Project and finally, if a GroupID has not been found, the Manifest. Manifest should not return an answer if one was found in PomProject was, and PomProject should not return a GroupID answer if PomProperties found one.

  3. Syft eliminates java packages as duplicates if the package names match. With the enhanced GroupID detection we can extend this duplicate elimination to make sure syft is not eliminating packages with similar names, but different GroupID.

    • Reviewers of this PR will notice that there has been an update in how the virtual path is expressed on the java metadata. This is a consequence of new java packages being added when before they were incorrectly deduped, . Rather than vPathSuffix += ":" + pomProperties.ArtifactID, which just uses the ArtifactID, SBOM consumers will see the full path of the package's properties used for package creation Ex: META-INF/maven/org.glassfish.jaxb/jaxb-core/pom.properties. This allows for better identification for child packages that look identical, but are actually forks or similar clones:
      • Previously: /casb.war:WEB-INF/lib/jaxb-core-2.2.11.jar:jaxb-core
      • Now:/casb.war:WEB-INF/lib/jaxb-core-2.2.11.jar:META-INF/maven/org.glassfish.jaxb/jaxb-core

@github-actions
Copy link

github-actions bot commented Jul 25, 2023

Benchmark Test Results

Benchmark results from the latest changes vs base branch
goos: linux%0Agoarch: amd64%0Apkg: github.com/anchore/syft/test/integration%0Acpu: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz%0A                                                              │ ./.tmp/benchmark-64a1f93.txt │%0A                                                              │            sec/op            │%0AImagePackageCatalogers/alpmdb-cataloger-2                                       12.12m ±  1%25%0AImagePackageCatalogers/apkdb-cataloger-2                                        671.5µ ±  5%25%0AImagePackageCatalogers/binary-cataloger-2                                       203.2µ ±  1%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                       558.5µ ±  1%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                   20.82µ ±  1%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                             91.54µ ±  0%25%0AImagePackageCatalogers/java-cataloger-2                                         13.29m ±  0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                         91.54µ ± 15%25%0AImagePackageCatalogers/javascript-package-cataloger-2                           345.2µ ±  1%25%0AImagePackageCatalogers/nix-store-cataloger-2                                    252.3µ ±  1%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                       741.0µ ±  1%25%0AImagePackageCatalogers/portage-cataloger-2                                      416.9µ ±  1%25%0AImagePackageCatalogers/python-package-cataloger-2                               3.160m ±  1%25%0AImagePackageCatalogers/r-package-cataloger-2                                    177.4µ ±  2%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                       473.9µ ±  2%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                 839.9µ ±  0%25%0AImagePackageCatalogers/sbom-cataloger-2                                         116.1µ ±  0%25%0Ageomean                                                                         454.2µ%0A%0A                                                              │ ./.tmp/benchmark-64a1f93.txt │%0A                                                              │             B/op             │%0AImagePackageCatalogers/alpmdb-cataloger-2                                       5.142Mi ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                        205.2Ki ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                       30.55Ki ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                       172.8Ki ± 0%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                   3.697Ki ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                             9.906Ki ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                         2.821Mi ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                         8.594Ki ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                           94.36Ki ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                    49.33Ki ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                       186.4Ki ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                      120.2Ki ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                               1.004Mi ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                    53.29Ki ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                       181.5Ki ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                 144.1Ki ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                         14.20Ki ± 0%25%0Ageomean                                                                         100.6Ki%0A%0A                                                              │ ./.tmp/benchmark-64a1f93.txt │%0A                                                              │          allocs/op           │%0AImagePackageCatalogers/alpmdb-cataloger-2                                        88.14k ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                         4.190k ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                         848.0 ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                        3.145k ± 0%25%0AImagePackageCatalogers/dotnet-portable-executable-cataloger-2                     132.0 ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                               281.0 ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                          40.59k ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                           228.0 ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                            1.342k ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                      898.0 ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                        4.079k ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                       2.272k ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                                16.45k ± 0%25%0AImagePackageCatalogers/r-package-cataloger-2                                      929.0 ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                        3.992k ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                                  2.447k ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                           394.0 ± 0%25%0Ageomean                                                                          2.063k

@spiffcs spiffcs self-assigned this Jul 27, 2023
@spiffcs spiffcs linked an issue Aug 1, 2023 that may be closed by this pull request
@spiffcs spiffcs changed the title fix: update groupID to stable sort for selection fix: update java archive cataloger to include child packages when different metadata Aug 7, 2023
@spiffcs spiffcs changed the title fix: update java archive cataloger to include child packages when different metadata fix: update java archive cataloger to include similar child packages Aug 7, 2023
@spiffcs spiffcs force-pushed the 1944-inconsistent-purl-generation branch from a10a433 to 8a9d91b Compare August 8, 2023 13:17
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
Signed-off-by: Christopher Phillips <[email protected]>
@spiffcs spiffcs changed the title fix: update java archive cataloger to include similar child packages fix: update java cataloger to include similar child packages, correct PURL, and correct GroupID Aug 14, 2023
@spiffcs spiffcs force-pushed the 1944-inconsistent-purl-generation branch from b777224 to 1faff35 Compare August 16, 2023 13:45
@spiffcs spiffcs closed this Aug 17, 2023
@spiffcs
Copy link
Contributor Author

spiffcs commented Aug 17, 2023

Splitting this into two PR and merging in upstream fixes from #2032

@spiffcs spiffcs deleted the 1944-inconsistent-purl-generation branch November 17, 2024 19:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

SBOMs are not the same on multiple runs of syft
1 participant