Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to get the source package name (and source version) in the report? #1083

Closed
sameer1046 opened this issue Nov 15, 2021 · 20 comments · Fixed by #1092
Closed

Is it possible to get the source package name (and source version) in the report? #1083

sameer1046 opened this issue Nov 15, 2021 · 20 comments · Fixed by #1092
Labels
feature new feature

Comments

@sameer1046
Copy link

Currently tern reports the binary package info of the Debian packages. It would be great if tern can provide the source package information in the report so that it would be easy to feed the report directly on license scanning tools.

@nishakm nishakm added the feature new feature label Nov 17, 2021
@nishakm
Copy link
Contributor

nishakm commented Nov 17, 2021

@rnjudge what do you think?

@rnjudge
Copy link
Contributor

rnjudge commented Nov 17, 2021

I think this is technically possible for a some package managers but not all. For example, i don't think python packages have the concept of a "source" package vs binary like rpm or deb packages do. I do worry about the clutter in the default table with another column for source package. I wonder if this is a command line flag we could include? @nishakm thoughts?

@nishakm
Copy link
Contributor

nishakm commented Nov 17, 2021

@rnjudge a long time ago, tern included some ability to retrieve corresponding sources, particularly for debian packages as it was the easiest one to implement. This is different from "source package name" though.

I agree with the cluttering of the default table. We may want to direct folks looking for more information to the JSON or HTML report format.

@sameer1046 can you provide an example of "source package name"?

@rnjudge
Copy link
Contributor

rnjudge commented Nov 17, 2021

I think what he means by source package name is in reference to rpm or deb packages where there's one source package that contains the source code which builds/produces the associated binary packages (the source package is not typically installed). For example, the systemd source package produces systemd, udev, libudev1 binaries and more (https://packages.debian.org/source/sid/systemd). These binary packages are what Tern reports. The source is important because CVEs are reported by source package name, which I think is what @sameer1046 is getting at. Is this an accurate summary, Sameer?

@sameer1046
Copy link
Author

@rnjudge Exactly. in linux/debian/rpm package have one source and from that source different binary packages were built. It would be great if it will be available in cyclonedx format

@nishakm
Copy link
Contributor

nishakm commented Nov 18, 2021

Thanks for the explanation and clarification! I suppose this is possible for deb and rpm. I am more familiar with deb than rpm so I'll work through what is needed to implement that:

  1. Read the /etc/apt/sources.list file and files existing in /etc/apt/sources.list.d
  2. Write back to the files the URLs except modify deb to deb-src
  3. Run apt-get update
  4. Run apt-cache showsrc <package name> and parse the output to get the package name

I think it's possible to add this for the distros that support it. It requires adding a new property in Package called source_package_name and a script that does all the above steps.
@rnjudge WDYT?

@gernot-h
Copy link

As a colleague of @sameer1046 at Siemens, I'm involved into license scanning topics since a couple of years, so some additional bits:

@nishakm
Copy link
Contributor

nishakm commented Nov 29, 2021

I think the question we are grappling with right now is whether "source package name" should be included in our summary report. It's already becoming less summarizing ;). @sameer1046 @gernot-h if you are OK using a combination of tern's JSON format and jq, this is totally doable.

@gernot-h
Copy link

I think the question we are grappling with right now is whether "source package name" should be included in our summary report. It's already becoming less summarizing ;). @sameer1046 @gernot-h if you are OK using a combination of tern's JSON format and jq, this is totally doable.

For the Siemens use-case, we just need this information in any machine-readable format, so JSON sounds perfect! :)

@sameer1046
Copy link
Author

sameer1046 commented Nov 29, 2021

I would suggest to produce a cyclone dx bom in source package format by setting a flag in the command line.
which ll list all source package in the bom in purl spec.
E.g. pkg:deb/debian/[email protected]?packaging=sources
This will produce a bom which will contain only source packages and not binary packages.

@rnjudge
Copy link
Contributor

rnjudge commented Nov 29, 2021

I would suggest to produce a cyclone dx bom in source package format by setting a flag in the command line. which ll list all source package in the bom in purl spec. E.g. pkg:deb/debian/[email protected]?packaging=sources This will produce a bom which will contain only source packages and not binary packages.

We might need @coderpatros's help with this one after we add source package info to the data model as he is the CycloneDX format wizard :)

@Ranjit-Kumar-Nayak
Copy link

@rnjudge @nishakm hey i am just new to i want to also contribute on this can you provide me some resources ?

@rnjudge
Copy link
Contributor

rnjudge commented Dec 8, 2021

Hi @Ranjit-Kumar-Nayak -- thanks for your interest! If you have good ways/can come up with a way to list source packages installed on a system using dpkg or rpm package managers (or in a bash script) that would be a good starting point! Bonus points if you can find the source given it's binary package name.

@rnjudge
Copy link
Contributor

rnjudge commented Dec 9, 2021

@sameer1046 @gernot-h may I ask which vulnerability scanner you are using that requires sources?

@gernot-h
Copy link

gernot-h commented Dec 9, 2021

Thanks for the explanation and clarification! I suppose this is possible for deb and rpm. I am more familiar with deb than rpm so I'll work through what is needed to implement that:

1. Read the `/etc/apt/sources.list` file and files existing in `/etc/apt/sources.list.d`

2. Write back to the files the URLs except modify `deb` to `deb-src`

3. Run `apt-get update`

4. Run `apt-cache showsrc <package name>` and parse the output to get the package name

Sorry, @nishakm, for coming back to this so late. I'm not exactly sure what the environment is here (I don't know how tern works internally), but in case you are running on the system with the packages in question, you don't need additional apt sources. All you need is already known by dpkg once your package is installed, a simple

dpkg-query -f '${source:Package} ${source:Version} -W <pkg>

will show you the source information for an installed package. The procedure you described might however make perfect sense if you want to run it in a distinct environment where the package you analyze is not installed.

By the way, there's an important thing to note: the source version might also differ from the binary version in rare cases. So you should also query for it and add it to the BOM.

@gernot-h
Copy link

gernot-h commented Dec 9, 2021

@sameer1046 @gernot-h may I ask which vulnerability scanner you are using that requires sources?

Sure. :) This is not about security scanning, but about license clearing (legal compliance task...). We use (and maintain ;) ) https://github.com/fossology/ for this.

@rnjudge
Copy link
Contributor

rnjudge commented Dec 9, 2021

dpkg-query -f '${source:Package} ${source:Version} -W <pkg>

Thank you!! This is super helpful. Do you have the command to do this using RPM handy?

By the way, there's an important thing to note: the source version might also differ from the binary version in rare cases. So you should also query for it and add it to the BOM.
Noted.

@gernot-h
Copy link

gernot-h commented Dec 10, 2021

@sameer1046, could you please update the issue title and initial description to also include "source version", so for example "get the source package name (and source version) in the report". I put this in brackets as I'm unsure if this is relevant for other distributions than Debian and Ubuntu, though.

@gernot-h
Copy link

Thank you!! This is super helpful. Do you have the command to do this using RPM handy?

This should be rpm -q --qf '%{SOURCERPM}' <pkg>. The SOURCERPM also contains source version number.

And, on my OpenSUSE system, source version can also differ from binary version in rare cases, so this seems to be a common concept:

> rpm -q --qf "%{NAME} %{VERSION}-%{RELEASE} %{SOURCERPM}\n" cron
cron 4.2-70.14.4.1 cronie-1.5.1-70.14.4.1.src.rpm

@sameer1046 sameer1046 changed the title Is it possible to get the source package name in the report? Is it possible to get the source package name (and source version) in the report? Dec 10, 2021
@rnjudge
Copy link
Contributor

rnjudge commented Dec 10, 2021

Thanks again @gernot-h. I think we can get this feature merged and data available in the JSON report before our next release (planned for next week).

rnjudge added a commit to rnjudge/tern that referenced this issue Dec 13, 2021
This commit adds source package name and source package version
information to the Package object data model.

Tern currently reports binary package metadata in its reports. Source
packages exist in operating systems like Debian and RedHat and differ
from binary packages. Source packages provide all of the necessary files
to compile or build a desired piece of software. Binary packages are
what get produced as a result of building a source package and are
what typically gets installed in an environment. Binary packages can
have different names and/or versions as their source package.

Source packages are relevant in the context of security scanning as most
CVEs are reported by source package name and version.

Works towards tern-tools#1083

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Dec 13, 2021
This commit adds scripts in base.yml to collect source package names and
versions for deb and rpm package managers.

Resolves tern-tools#1083

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Dec 13, 2021
This commit adds scripts in base.yml to collect source package names and
versions for dpkg and rpm package managers.

Resolves tern-tools#1083

Signed-off-by: Rose Judge <[email protected]>
rnjudge added a commit to rnjudge/tern that referenced this issue Dec 13, 2021
This commit adds scripts in base.yml to collect source package names and
versions for dpkg and rpm package managers.

Resolves tern-tools#1083

Signed-off-by: Rose Judge <[email protected]>
nishakm pushed a commit that referenced this issue Dec 13, 2021
This PR adds source package name and source package version
information to the Package object data model. It also adds scripts
in base.yml for rpm and dpkg package managers to collect source
package names and versions.

Tern currently reports binary package metadata in its reports. Source
packages exist in operating systems like Debian and RedHat and differ
from binary packages. Source packages provide all of the necessary files
to compile or build a desired piece of software. Binary packages are
what get produced as a result of building a source package and are
what typically gets installed in an environment. Binary packages can
have different names and/or versions as their source package.

Source packages are relevant in the context of security scanning as most
CVEs are reported by source package name and version.

Resolves #1083

Signed-off-by: Rose Judge <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants