Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add baseline metrics for lines of code #459

Merged
merged 1 commit into from
Apr 26, 2021

Conversation

aeisenberg
Copy link
Contributor

@aeisenberg aeisenberg commented Apr 22, 2021

This commit uses a third party library to estimate the lines of code in
a database that is to be analyzed by codeql.

The estimate uses the same includes and excludes globs for determining
which files should be counted.

The lines of code count is returned by language and injected into the
SARIF as appropriate.

Currently, this PR adds the LoC data in the metricResults property of the sarif in a blob like this:

      {
          metric: `baseline/${language}/lines-of-code`,
          value: lineCounts[language]
        }

We haven't agreed on what this will look like, so injecting the metric may change.

We've decided that the lines of code will be injected into metrics with id like this: ${language}/summary/lines-of-code and a new baseline property is added to the metric. Languages that we have a count for, but no metric, will be ignored.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.

@aeisenberg aeisenberg marked this pull request as draft April 22, 2021 22:51
@aeisenberg aeisenberg force-pushed the aeisenberg/add-github-linguist branch from c5d6cae to d2b4652 Compare April 22, 2021 22:54
@aeisenberg aeisenberg force-pushed the aeisenberg/add-github-linguist branch from d2b4652 to c4a84a9 Compare April 22, 2021 22:59
@@ -145,6 +146,15 @@ export async function runQueries(
): Promise<QueriesStatusReport> {
const statusReport: QueriesStatusReport = {};

// count the number of lines in the background
const locPromise = countLoc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever, so the promise it's only resolved when it's used but is used in potentially multiple places. Worth noting that in practice there will always be some queries to evaluate so we'll always end up using this promise. It's an error for there not to be any queries to analyse and it would have error-ed back in the init step. Up to you if you therefore want to leave this as it is or potentially simplify it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason why I'm doing this is so that the line counting can happen in "parallel". On large projects, counting can take 10-20s and there's lots of disk IO, so it's nice to be able to run this while other things are happening.

src/count-loc.ts Show resolved Hide resolved
@aeisenberg aeisenberg force-pushed the aeisenberg/add-linguist-data branch 2 times, most recently from 74c1aef to 5201fb4 Compare April 23, 2021 17:55
@aeisenberg aeisenberg marked this pull request as ready for review April 23, 2021 17:57
Base automatically changed from aeisenberg/add-github-linguist to main April 23, 2021 17:59
Copy link
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clear. Couple of recommendations based on CodeQL conventions for labelling languages.

src/count-loc.ts Show resolved Hide resolved
src/count-loc.ts Outdated Show resolved Hide resolved
@aeisenberg aeisenberg force-pushed the aeisenberg/add-linguist-data branch 2 times, most recently from 0ce85b6 to 674720b Compare April 23, 2021 21:59
Copy link
Contributor

@robertbrignull robertbrignull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my point of view. Probably best let @adityasharad also review the recent changes.

src/count-loc.ts Outdated Show resolved Hide resolved
src/analyze.ts Outdated Show resolved Hide resolved
This commit uses a third party library to estimate the lines of code in
a database that is to be analyzed by codeql.

The estimate uses the same includes and excludes globs for determining
which files should be counted.

The lines of code count is returned by language and injected into the
SARIF as `baseline` property in the `${language}/summary/lines-of-code`
metric.
@aeisenberg aeisenberg merged commit 03f029c into main Apr 26, 2021
@aeisenberg aeisenberg deleted the aeisenberg/add-linguist-data branch April 26, 2021 21:23
@github-actions github-actions bot mentioned this pull request Apr 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants