Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gradation to license analysis #1369

Closed
david-a-wheeler opened this issue Dec 6, 2021 · 1 comment · Fixed by #2442
Closed

Add gradation to license analysis #1369

david-a-wheeler opened this issue Dec 6, 2021 · 1 comment · Fixed by #2442
Labels
help wanted Community contributions welcome, maintainers supportive of idea but not a high priority kind/enhancement New feature or request

Comments

@david-a-wheeler
Copy link
Contributor

david-a-wheeler commented Dec 6, 2021

Is your feature request related to a problem? Please describe.

Currently scorecard looks for a license, but it simply gives a 10 if there's a license found at all and a 0 if there's no license found at all. That doesn't give any gradation.

Describe the solution you'd like

Add more gradations for licenses. I think scorecard was intended for OSS projects, so I propose giving "10" for projects with a license known to be OSS per OSI or Free Software as defined by the FSF. Otherwise, give it a 5. For projects on GitHub you can just reuse the GitHub analysis, as GitHub has an API that provides license info. You can see an example of this use in the CII Best Practices' github_basic_detective and floss_license_detective.

Further discussion here: #1038

Describe alternatives you've considered

Different people have different views of copyleft, I don't think scorecard should force a particular view of whether copyleft or not-copyleft is better (less risk). You could argue either way, which to me suggests you shouldn't make the argument. Even those who prefer BSD-style licenses will often use the Linux kernel (GPL), because how you use software is very important. As a result, I think "is it approved by OSI or FSF" should be enough, don't force further gradations.

Additional context

Again, further discussion here: #1038

@david-a-wheeler david-a-wheeler added the kind/enhancement New feature or request label Dec 6, 2021
@laurentsimon laurentsimon added the help wanted Community contributions welcome, maintainers supportive of idea but not a high priority label Dec 14, 2021
@shissam
Copy link
Contributor

shissam commented Nov 6, 2022

I'd like to see this implements AND to help on this. I did some experiments this weekend with scorecard and added a githubrepo client called clients/githubrepo/licenses.go that uses go-github/v38 API github.RepositoryLicense to obtain the license information known to github via the RESTAPI

curl -D fred.txt -H "Accept: application/vnd.github+json" -H "Authorization: Bearer <GITHUB_TOKEN>" 'https://api.github.com/repos/<owner>/<repo>/license'

My proposal

  • introduce clients/licenses.go which embodies this github api (similar to clients/languages.go) - it would be plural as it may come down to the point where gitlab returns multiple licenses for a repo (I have a real-world example).
  • This github client would populate a structure like the following:
type License struct {
        Key      string // RepositoryLicense.GetLicense().GetKey()
        Name     string // RepositoryLicense.GetLicense().GetName()
        Path     string // RepositoryLicense.GetName()
        Size     int    // RepositoryLicense.GetSize()
        SPDXId   string // RepositoryLicense.GetLicense().GetSPDXID()
        Type     string // RepositoryLicense.GetType()
}

where (example from ossf/scorecard:

{
 Key:apache-2.0
 Name:Apache License 2.0
 Path:LICENSE
 Size:11356
 SPDXId:Apache-2.0 
Type:file
}
  • modify checks/raw/license.go to follow one of two possible paths:
    -- pathOne try the repoClient (as in c.RepoClient.ListLicenses()), if a license is reported by github USE that and return
    -- pathTwo if the repoClient returns nil, continue with the legacy logic and return what is always returned
  • there are a number of advantages to this approach:
    -- speedup: for github, my test of c.RepoClient.ListLicenses() returns in less than 5 seconds, when compared to other repos with many files (like torvalds/linux this check takes over a minute
    -- extensibility: this test with github is promising, and gitlab does appear to have a api for getting this information (https://docs.gitlab.com/ee/api/templates/licenses.html)
    -- confidence: using the github api for 'sensing' which specific license is in use I assume is higher (more review on their algorithm which might use NLP)--I tried a license on my site which changed the work Apache to Apacje in the complete Apache License 2.0 file and github still sensed it was APL-2.0 (very cool).
    -- standardization: I am encouraged by the use of the SPDXId in the github API, if it is true that gitlab using the same standard identifier for licenses - this would be great for the community
    -- usability: using scorecard to retrieve the specific license detected would a) make this information more readily available to the end-user; and b) would support develop pipelines that need to sense the license (by name) as a risk management activity without having to move to other means to acquire such license information
  • with this approach, propose changing checker.LicenseData{}:
type License struct {
        Key      string // from clients.License.Key
        Name     string // from clients.License.Name
        Size     int    // clients.License.Size (Size maybe not here but in type File struct
        SPDXId   string // clients.License.SPDXId
        Attribution     AttributionType // license sourced from LicenseRepoAPI or ScorecardEngine
}
type LicenseData struct {
        Files []File
        License []License
}

Design Decisions (not all are needed to get this moving)

  • scoring: which would occur in checks/evaluation/license.go would not change at this time - a file is a file, both pathOne and pathTwo would continue to return this information
    -- possible future approachs suitable for this proposal: when:
    --- Attribution is LicenseRepoAPI (i.e., pathOne) then consult a map to convert the SPDXId to a score
    --- Attribution is ScorecardEngine (i.e., pathTwo) then continue to affix scorecard max, later as pathTwo is improved to regex licenses into SPDXIds then the map can be used in the evaluation.

Some particulary interesting github repos with possible licenses[1]

  • github.com/Fivium/FOXopen
    {Key:gpl-3.0 Name:GNU General Public License v3.0 Path:LICENSE Size:35147 SPDXId:GPL-3.0 Type:file}
    --NB: this site actually confuses github showing two licenses on the website yet the API only returns the first, hence it appears based on this one example, the API will (for now) only return one license file
  • github.com/twardoch/varfonts-ofl
    NB: github API returns a 404 - hence pathTwo continues detecting a presence which github did not
  • github.com/mozilla/openbadges-bakery
    {Key:mpl-2.0 Name:Mozilla Public License 2.0 Path:LICENSE-MPL-2.0 Size:16726 SPDXId:MPL-2.0 Type:file}
  • github.com/IQAndreas/markdown-licenses
    {Key:unlicense Name:The Unlicense Path:unlicense.md Size:1275 SPDXId:Unlicense Type:file}
    NB: notice the Unlicense
  • github.com/johnjago/awesome-uncopyright
    {Key:cc0-1.0 Name:Creative Commons Zero v1.0 Universal Path:LICENSE Size:6515 SPDXId:CC0-1.0 Type:file}
  • github.com/USEPA/open-source-projects
    {Key:mit Name:MIT License Path:license.md Size:1124 SPDXId:MIT Type:file}

[1] see these examples: https://github.com/Fivium/FOXopen/blob/master/LICENSE-THIRD-PARTY.md

shissam added a commit to shissam/scorecard that referenced this issue Nov 9, 2022
laurentsimon pushed a commit that referenced this issue Nov 28, 2022
* ✨ Improved Security Policy Check (#2137)

* Examines and awards points for linked content (URLs / Emails)

* Examines and awards points for hints of disclosure and vulnerability practices

* Examines and awards points for hints of elaboration of timelines

Signed-off-by: Scott Hissam <[email protected]>

* Repaired Security Policy to correctly use linked content length for evaluation

Signed-off-by: Scott Hissam <[email protected]>

* gofmt'ed changes

Signed-off-by: Scott Hissam <[email protected]>

* Repaired the case in the evaluation which was too sensitive to content length over the length of the linked content for urls and emails

Signed-off-by: Scott Hissam <[email protected]>

* added unit test cases for the new content-based Security Policy checks

Signed-off-by: Scott Hissam <[email protected]>

* reverted the direct (mistaken) change to checks.md and updated the checks.yaml for generate-docs

Signed-off-by: Scott Hissam <[email protected]>

* ✨ Improved Security Policy Check (#2137) (revisted based on comments)

* replaced reason strings with log.Info & log.Warn (as seen in --show-details)

* internal assertion check for nil (*pinfo) and empty pfile

* internal switched to FileTypeText over FileTypeSource

* internal implement type SecurityPolicyInformationType/SecurityPolicyInformation revised SecurityPolicyData to support only one file

* revised expected unit-test results and revised unit-test to reflect the new SecurityPolicyData type

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; e2e tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* Addressed PR comments; added telemetry for policy hits in security policy file to track hits by line number

Signed-off-by: Scott Hissam <[email protected]>

* Resolved merge conflict with checks.yaml

Signed-off-by: Scott Hissam <[email protected]>

* updated raw results to emit all the raw information for the new security policy check

Signed-off-by: Scott Hissam <[email protected]>

* Resolved merge conflicts and lint errors with json_raw_results.go

Signed-off-by: Scott Hissam <[email protected]>

* Addressed review comments to reorganize security policy data struct to support the potential for multiple security policy files.

Signed-off-by: Scott Hissam <[email protected]>

* Added logic to the security policy to process multiple security policy files only after future improvements to aggregating scoring across such files are designed. For now the security policy behaves as originally designed to stop once one of the expected policy files are found in the repo

Signed-off-by: Scott Hissam <[email protected]>

* added comments regarding the capacity to support multiple policy files and removed unneeded break statements in the code

Signed-off-by: Scott Hissam <[email protected]>

* Addressed review comments to remove the dependency on the path in the filename from the code and introduced FileSize to checker.File type and removed the SecurityContentLength which was used to hold that information for the new security policy assessment

Signed-off-by: Scott Hissam <[email protected]>

* restored reporting full security policy path and filename for policies found in the org level repos

Signed-off-by: Scott Hissam <[email protected]>

* Resolved conflicts in checks.yaml for documentation

Signed-off-by: Scott Hissam <[email protected]>

* ✨ CLI for scorecard-attestor (#2309)

* Reorganize

Signed-off-by: Raghav Kaul <[email protected]>

* Working commit

Signed-off-by: Raghav Kaul <[email protected]>

* Compile with local scorecard; go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* Add signing code

Heavily borrowed from https://github.com/grafeas/kritis/blob/master/cmd/kritis/signer/main.go

Signed-off-by: Raghav Kaul <[email protected]>

* Update deps

* Naming
* Makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Edit license, add lint.yml

Signed-off-by: Raghav Kaul <[email protected]>

* checks: go mod tidy, license

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Split into checker/signer files
* Naming convention

Signed-off-by: Raghav Kaul <[email protected]>

* License, remove golangci.yml

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Use cobra

Signed-off-by: Raghav Kaul <[email protected]>

* Add tests for root command

Signed-off-by: Raghav Kaul <[email protected]>

* Filter out checks that aren't needed for policy evaluation

Signed-off-by: Raghav Kaul <[email protected]>

* Add `make` targets for attestor; submit coverage stats

Signed-off-by: Raghav Kaul <[email protected]>

* Improvements

* Use sclog instead of glog
* Remove unneeded subcommands
* Formatting

Signed-off-by: Raghav Kaul <[email protected]>

* Flags: Make note-name constant and fix messaging

Signed-off-by: Raghav Kaul <[email protected]>

* Remove SupportedRequestTypes

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy, makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Fix GH actions run

Signed-off-by: Raghav Kaul <[email protected]>

Signed-off-by: Raghav Kaul <[email protected]>
Signed-off-by: Scott Hissam <[email protected]>

* removed whitespace before stanza for Run attestor e2e

Signed-off-by: Scott Hissam <[email protected]>

* resolved code review and doc review comments

Signed-off-by: Scott Hissam <[email protected]>

* repaired the link for the maintainer's guide for supporting the coordinated vulnerability disclosure guidelines

Signed-off-by: Scott Hissam <[email protected]>

* initial implementation of #1369 (comment) to provide more license details

Signed-off-by: Scott Hissam <[email protected]>

* draft implementation to provide more information on license details

Signed-off-by: Scott Hissam <[email protected]>

* repaired a misspelling

Signed-off-by: Scott Hissam <[email protected]>

* Changed to handle http errors with 404 not found as being a non-error for not being able to find a license

Signed-off-by: Scott Hissam <[email protected]>

* Return an error status similar to other gitlab checks

Signed-off-by: Scott Hissam <[email protected]>

* add new raw licenses data

Signed-off-by: Scott Hissam <[email protected]>

* updated e2e test as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <[email protected]>

* added numerous more test filenames and a shouldFail boolean as some filenames will fail that do not meet checks.md rules

Signed-off-by: Scott Hissam <[email protected]>

* license check now, primarily, uses the GH API for checking licenses

Signed-off-by: Scott Hissam <[email protected]>

* updated local checker as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <[email protected]>

* added draft license gradation for scoring, add a map to OSI and FSF licenses, added GH API for retrieving repo license, revamp license filename matching when not using a repo API for detecting license files.

Signed-off-by: Scott Hissam <[email protected]>

* repaired race condition for case insensitive map, improved regex matching, moved licenses to raw, raw now mimics GH API return values for key, name, etc., updated unit tests and raw results accordingly

Signed-off-by: Scott Hissam <[email protected]>

* completed disambiguation of SPDX Identifiers and filename extensions, reworked some of the code comments, added map generation to TestLicense, added an additional mutex for the regex group identifier index, removed spurious prints, revised unit test accordingly, updated documentation.

Signed-off-by: Scott Hissam <[email protected]>

* removed repo Key from LicenseInformation as unneeded, changed attribution constants to be more meaningful, update documentation as necessary for changes

Signed-off-by: Scott Hissam <[email protected]>

Signed-off-by: Scott Hissam <[email protected]>
Signed-off-by: Raghav Kaul <[email protected]>
Co-authored-by: raghavkaul <[email protected]>
raghavkaul added a commit to raghavkaul/scorecard that referenced this issue Feb 9, 2023
* ✨ Improved Security Policy Check (ossf#2137)

* Examines and awards points for linked content (URLs / Emails)

* Examines and awards points for hints of disclosure and vulnerability practices

* Examines and awards points for hints of elaboration of timelines

Signed-off-by: Scott Hissam <[email protected]>

* Repaired Security Policy to correctly use linked content length for evaluation

Signed-off-by: Scott Hissam <[email protected]>

* gofmt'ed changes

Signed-off-by: Scott Hissam <[email protected]>

* Repaired the case in the evaluation which was too sensitive to content length over the length of the linked content for urls and emails

Signed-off-by: Scott Hissam <[email protected]>

* added unit test cases for the new content-based Security Policy checks

Signed-off-by: Scott Hissam <[email protected]>

* reverted the direct (mistaken) change to checks.md and updated the checks.yaml for generate-docs

Signed-off-by: Scott Hissam <[email protected]>

* ✨ Improved Security Policy Check (ossf#2137) (revisted based on comments)

* replaced reason strings with log.Info & log.Warn (as seen in --show-details)

* internal assertion check for nil (*pinfo) and empty pfile

* internal switched to FileTypeText over FileTypeSource

* internal implement type SecurityPolicyInformationType/SecurityPolicyInformation revised SecurityPolicyData to support only one file

* revised expected unit-test results and revised unit-test to reflect the new SecurityPolicyData type

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; unit tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* revised the score value based on observation of one *or more* url(s) or one email(s) found; e2e tests update accordingly

Signed-off-by: Scott Hissam <[email protected]>

* Addressed PR comments; added telemetry for policy hits in security policy file to track hits by line number

Signed-off-by: Scott Hissam <[email protected]>

* Resolved merge conflict with checks.yaml

Signed-off-by: Scott Hissam <[email protected]>

* updated raw results to emit all the raw information for the new security policy check

Signed-off-by: Scott Hissam <[email protected]>

* Resolved merge conflicts and lint errors with json_raw_results.go

Signed-off-by: Scott Hissam <[email protected]>

* Addressed review comments to reorganize security policy data struct to support the potential for multiple security policy files.

Signed-off-by: Scott Hissam <[email protected]>

* Added logic to the security policy to process multiple security policy files only after future improvements to aggregating scoring across such files are designed. For now the security policy behaves as originally designed to stop once one of the expected policy files are found in the repo

Signed-off-by: Scott Hissam <[email protected]>

* added comments regarding the capacity to support multiple policy files and removed unneeded break statements in the code

Signed-off-by: Scott Hissam <[email protected]>

* Addressed review comments to remove the dependency on the path in the filename from the code and introduced FileSize to checker.File type and removed the SecurityContentLength which was used to hold that information for the new security policy assessment

Signed-off-by: Scott Hissam <[email protected]>

* restored reporting full security policy path and filename for policies found in the org level repos

Signed-off-by: Scott Hissam <[email protected]>

* Resolved conflicts in checks.yaml for documentation

Signed-off-by: Scott Hissam <[email protected]>

* ✨ CLI for scorecard-attestor (ossf#2309)

* Reorganize

Signed-off-by: Raghav Kaul <[email protected]>

* Working commit

Signed-off-by: Raghav Kaul <[email protected]>

* Compile with local scorecard; go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* Add signing code

Heavily borrowed from https://github.com/grafeas/kritis/blob/master/cmd/kritis/signer/main.go

Signed-off-by: Raghav Kaul <[email protected]>

* Update deps

* Naming
* Makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Edit license, add lint.yml

Signed-off-by: Raghav Kaul <[email protected]>

* checks: go mod tidy, license

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Split into checker/signer files
* Naming convention

Signed-off-by: Raghav Kaul <[email protected]>

* License, remove golangci.yml

Signed-off-by: Raghav Kaul <[email protected]>

* Address PR comments

* Use cobra

Signed-off-by: Raghav Kaul <[email protected]>

* Add tests for root command

Signed-off-by: Raghav Kaul <[email protected]>

* Filter out checks that aren't needed for policy evaluation

Signed-off-by: Raghav Kaul <[email protected]>

* Add `make` targets for attestor; submit coverage stats

Signed-off-by: Raghav Kaul <[email protected]>

* Improvements

* Use sclog instead of glog
* Remove unneeded subcommands
* Formatting

Signed-off-by: Raghav Kaul <[email protected]>

* Flags: Make note-name constant and fix messaging

Signed-off-by: Raghav Kaul <[email protected]>

* Remove SupportedRequestTypes

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy

Signed-off-by: Raghav Kaul <[email protected]>

* go mod tidy, makefile

Signed-off-by: Raghav Kaul <[email protected]>

* Fix GH actions run

Signed-off-by: Raghav Kaul <[email protected]>

Signed-off-by: Raghav Kaul <[email protected]>
Signed-off-by: Scott Hissam <[email protected]>

* removed whitespace before stanza for Run attestor e2e

Signed-off-by: Scott Hissam <[email protected]>

* resolved code review and doc review comments

Signed-off-by: Scott Hissam <[email protected]>

* repaired the link for the maintainer's guide for supporting the coordinated vulnerability disclosure guidelines

Signed-off-by: Scott Hissam <[email protected]>

* initial implementation of ossf#1369 (comment) to provide more license details

Signed-off-by: Scott Hissam <[email protected]>

* draft implementation to provide more information on license details

Signed-off-by: Scott Hissam <[email protected]>

* repaired a misspelling

Signed-off-by: Scott Hissam <[email protected]>

* Changed to handle http errors with 404 not found as being a non-error for not being able to find a license

Signed-off-by: Scott Hissam <[email protected]>

* Return an error status similar to other gitlab checks

Signed-off-by: Scott Hissam <[email protected]>

* add new raw licenses data

Signed-off-by: Scott Hissam <[email protected]>

* updated e2e test as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <[email protected]>

* added numerous more test filenames and a shouldFail boolean as some filenames will fail that do not meet checks.md rules

Signed-off-by: Scott Hissam <[email protected]>

* license check now, primarily, uses the GH API for checking licenses

Signed-off-by: Scott Hissam <[email protected]>

* updated local checker as new license check generates more info and warn as scores change as license file content is not parsed

Signed-off-by: Scott Hissam <[email protected]>

* added draft license gradation for scoring, add a map to OSI and FSF licenses, added GH API for retrieving repo license, revamp license filename matching when not using a repo API for detecting license files.

Signed-off-by: Scott Hissam <[email protected]>

* repaired race condition for case insensitive map, improved regex matching, moved licenses to raw, raw now mimics GH API return values for key, name, etc., updated unit tests and raw results accordingly

Signed-off-by: Scott Hissam <[email protected]>

* completed disambiguation of SPDX Identifiers and filename extensions, reworked some of the code comments, added map generation to TestLicense, added an additional mutex for the regex group identifier index, removed spurious prints, revised unit test accordingly, updated documentation.

Signed-off-by: Scott Hissam <[email protected]>

* removed repo Key from LicenseInformation as unneeded, changed attribution constants to be more meaningful, update documentation as necessary for changes

Signed-off-by: Scott Hissam <[email protected]>

Signed-off-by: Scott Hissam <[email protected]>
Signed-off-by: Raghav Kaul <[email protected]>
Co-authored-by: raghavkaul <[email protected]>
spencerschrock added a commit to spencerschrock/scorecard-action that referenced this issue Jul 19, 2024
When the threshold was introduced, the license check was a boolean
check: 0 points for no license, and 10 points with a license. This
later changed as covered in ossf/scorecard#1369

As the last point relies on SPDX detection, it's often flaky. Lowering
the threshold allows us to still warn if a license isn't detected but
not expect perfection.

Signed-off-by: Spencer Schrock <[email protected]>
spencerschrock added a commit to ossf/scorecard-action that referenced this issue Jul 23, 2024
When the threshold was introduced, the license check was a boolean
check: 0 points for no license, and 10 points with a license. This
later changed as covered in ossf/scorecard#1369

As the last point relies on SPDX detection, it's often flaky. Lowering
the threshold allows us to still warn if a license isn't detected but
not expect perfection.

Signed-off-by: Spencer Schrock <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Community contributions welcome, maintainers supportive of idea but not a high priority kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants