Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Total number of baseline files limit #17743

Open
artem-smotrakov opened this issue Oct 11, 2024 · 9 comments
Open

C++: Total number of baseline files limit #17743

artem-smotrakov opened this issue Oct 11, 2024 · 9 comments
Labels
question Further information is requested

Comments

@artem-smotrakov
Copy link
Contributor

Hey friends, I have quite a large C++ database:

codeql database print-baseline -- ${CODEQL_DATABASE_DIR}
Counted a baseline of 27711380 lines of code for cpp.

Before running scans, I normally run some simple diagnostic queries to make sure the database looks fine. The queries look for things like:

  • Files
  • FunctionCalls
  • IfStmts

When I run these queries on this large database, I get this

codeql database analyze ${CODEQL_DATABASE_DIR} --format=sarif-latest --output=calls.sarif ${CODEQL_QUERIES}/qlpacks/cpp-queries/diagnostics/FunctionCalls.ql
Running queries.
[1/1 comp 7.8s] Compiled [...]/qlpacks/cpp-queries/diagnostics/FunctionCalls.ql.
Files.ql: [1/1 eval 36s] Results written to cpp-queries/diagnostics/FunctionCalls.bqrs.
Shutting down query evaluator.
Interpreting results.
Will not interpret file coverage baseline information, since the total number of baseline files is 153738, which is greater than the limit of 50000.

The exit code is 0 but calls.sarif is empty.

When I run queries from the standard C++ pack, I get the same message.

What does this limit mean? Is there any way to increase it? I didn't find anything either in the docs or in this repo unfortunately, may be missing something though. Thanks!

@artem-smotrakov artem-smotrakov added the question Further information is requested label Oct 11, 2024
@redsun82
Copy link
Contributor

👋 @artem-smotrakov, sorry for the late reply!

That limit is meant to avoid generating too large a SARIF file when populating the Tool Status Page for information about how many files were analyzed, hitting the SARIF file size limit. It is currently hard-coded and cannot be configured.

That said, I'm not entirely sure this limit (and the warning) should really cause a custom query like yours to return no results. Could you:

  • share your FunctionCalls.ql, so we can experiment with it a bit?
  • maybe try another output format like cvs to see if the issue is specifically related to the SARIF format?

@artem-smotrakov
Copy link
Contributor Author

artem-smotrakov commented Oct 14, 2024

Hi @redsun82 ! Thanks for your reply!

share your FunctionCalls.ql, so we can experiment with it a bit?

Yeah, sure, it's quite simple

import cpp

from FunctionCall call, Function func
where func = call.getTarget()
select call, "Call " + func + "(" + func.getParameterString() + ")"

maybe try another output format like cvs to see if the issue is specifically related to the SARIF format?

I get the same message for CSV if I use --format=csv --output=calls.csv. The calls.csv file is empty.

It is currently hard-coded and cannot be configured.

Would it be possible to make it configurable in one of the next releases? 🤔

@rvermeulen
Copy link
Contributor

Hi @artem-smotrakov,

The base line information should not influence the result of the query.
Could you run https://github.com/github/codeql/blob/main/cpp/ql/src/Diagnostics/ExtractionWarnings.ql to determine if other issue are influencing the results of the query?

@rvermeulen rvermeulen added the awaiting-response The CodeQL team is awaiting further input or clarification from the original reporter of this issue. label Oct 14, 2024
@artem-smotrakov
Copy link
Contributor Author

Hi @rvermeulen ! Attaching the results of the ExtractionWarnings.ql. I see errors in several files but the codebase has way more C++ files. Also, I got the same limit warning when I ran the query.

extractrion_warnings.sarif.txt

@rvermeulen
Copy link
Contributor

rvermeulen commented Oct 17, 2024

Hi @artem-smotrakov,

Let me forward this to our C/C++ team.
In the mean time, could you share which CodeQL CLI version you are using codeql version --format=json and the build-tracer.log that you can find in the database directory under logs. Before sharing make sure possible sensitive information is redacted (such as the unpack location).

Copy link
Contributor

github-actions bot commented Nov 1, 2024

This issue is stale because it has been open 14 days with no activity. Comment or remove the Stale label in order to avoid having this issue closed in 7 days.

@github-actions github-actions bot added the Stale label Nov 1, 2024
@artem-smotrakov
Copy link
Contributor Author

This issue is stale because it has been open 14 days with no activity. Comment or remove the Stale label in order to avoid having this issue closed in 7 days.

I am working on it, please don't close it.

@jketema jketema removed Stale awaiting-response The CodeQL team is awaiting further input or clarification from the original reporter of this issue. labels Nov 4, 2024
@artem-smotrakov
Copy link
Contributor Author

share which CodeQL CLI version you are using codeql version --format=json


{
  "productName" : "CodeQL",
  "vendor" : "GitHub",
  "version" : "2.19.0",
  "sha" : "9f0ad8ab1f14c2711b9fc2666e8bcdd09ab39ce8",
  "branches" : [
    "codeql-cli-2.19.0"
  ],
  "copyright" : "Copyright (C) 2019-2024 GitHub, Inc.",
  [...]
  "configFileFound" : false,
  "features" : {
    "analysisSummaryV2Default" : true,
    "buildModeOption" : true,
    "bundleSupportsIncludeDiagnostics" : true,
    "bundleSupportsIncludeLogs" : true,
    "databaseInterpretResultsSupportsSarifRunProperty" : true,
    "featuresInVersionResult" : true,
    "indirectTracingSupportsStaticBinaries" : false,
    "informsAboutUnsupportedPathFilters" : true,
    "supportsPython312" : true,
    "mrvaPackCreate" : true,
    "threatModelOption" : true,
    "traceCommandUseBuildMode" : true,
    "v2ramSizing" : true,
    "mrvaPackCreateMultipleQueries" : true,
    "setsCodeqlRunnerEnvVar" : true,
    "sarifMergeRunsFromEqualCategory" : true,
    "forceOverwrite" : true,
    "generateSummarySymbolMap" : true
  }
}

and the build-tracer.log

I am figuring out if I can actually share it.

@artem-smotrakov
Copy link
Contributor Author

build-tracer.log that you can find in the database directory under logs. Before sharing make sure possible sensitive information is redacted

Hi @rvermeulen The build-tracer.log is huge. I am looking at it and trying to redact possible sensitive info, but I am not super well familiar with what it has, and doesn't feel confident if I removed everything that has to be removed.

Is there anything specific that the C++ team might be looking for in this log?

What would be best way to share this log with the team? I would not like attaching it here a publicly visible comment. If that helps, my employer Riot Games is GitHub's customer, I guess there might be some private ticketing/etc services where I could probably post the file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants