C++: Fix some FPs in `cpp/missing-check-scanf` #16009

MathiasVP · 2024-03-21T10:40:23Z

This PR fixes a simple class of FPs in cpp/missing-check-scanf by improving the guards library. Previously, the guards library understood things such as:

if(a == b) {
  // a == b is inferred here
}

but we failed to infer this equality in:

bool r = a == b;
if(r) {
  // a == b was NOT inferred here
}

because the guards library hardcoded that the comparison should be the actual thing being branched on. This PR fixes this by using value numbers in the internal guard logic predicates. And since r and a == b has the same value number these two cases are now identical 🎉

Annoyingly, some FPs still appear because of the awkward conversions between IRBlocks and BasicBlocks done. This FPs should ideally be fixed by completely getting rid of BasicBlocks, but that's a job for another PR.

For now, I'm happy that a simple improvement to the guards library fixes the simple cases automagically 🎉

Commit-by-commit review recommended.

of the implementation predicates in 'IRGuards' from an 'Instruction' to a value number. This means it doesn't matter whether the expression being branched on is a comparison, or a use of a variable whose value is the result of a comparison. This first commit adds the various subclasses of value number that we need the rest purely mechanical changes that replaces 'Instruction's with 'ValueNumber's in the relevant places.

subclasses of value numbers added in the previous commit.

…ates.

jketema · 2024-03-21T10:50:01Z

cpp/ql/lib/semmle/code/cpp/controlflow/IRGuards.qll

@@ -872,15 +872,15 @@ private predicate simple_comparison_eq(
 }

 private predicate complex_eq(
-  CompareInstruction cmp, Operand left, Operand right, int k, boolean areEqual, AbstractValue value
+  ValueNumber cmp, Operand left, Operand right, int k, boolean areEqual, AbstractValue value


Is there a reason for not using the newly introduced CompareValueNumber here? And also in some other places in the commit that introduces this.

It actually does matter because of the LogicalNotInstruction case:

exists(AbstractValue dual, LogicalNotInstruction logicalNot | value = dual.getDualValue() and logicalNot = test.getAnInstruction() and compares_eq(valueNumber(logicalNot.getUnary()), left, right, k, areEqual, dual) )

here we don't require the value number to be a CompareValueNumber, but instead we require it to be something of the form !x, where the x is then used recursively.

I'll see if I can come up with a testcase where this is important 🤔

MathiasVP · 2024-03-21T11:00:07Z

I see some unexpected library test changes. I'll investigate them!

aschackmull · 2024-03-21T11:24:27Z

We really should get a shared Guards library. Anyway, just wanted to note that the Java Guards library already supports guards derived from assignments to boolean variables, but the approach is somewhat different, and, I think, more extensible. The Java version has the GuardsLogic library, which extends guards by making use of all sorts of implications that can be derived between conditions. Ideally all of this would be shared code...

MathiasVP · 2024-03-21T11:26:10Z

We really should get a shared Guards library. Anyway, just wanted to note that the Java Guards library already supports guards derived from assignments to boolean variables, but the approach is somewhat different, and, I think, more extensible. The Java version has the GuardsLogic library, which extends guards by making use of all sorts of implications that can be derived between conditions. Ideally all of this would be shared code...

Yeah, I agree. The C/C++ library has a very naive and ad-hoc approach to some of this.

MathiasVP · 2024-03-21T12:18:03Z

I see some unexpected library test changes. I'll investigate them!

These are now fixed by d67dae8 and 2252936. I'll rerun DCA.

Even though Schack described how it would be much better to have a shared guards library I still think this PR is worth it since it fixes some actual FPs in queries, and a shared guards library is still a (Glorious) Future item that no one is currently working on right now.

cpp/ql/lib/semmle/code/cpp/controlflow/IRGuards.qll

MathiasVP · 2024-03-21T15:28:08Z

Hm. I want to check Kamailio's 9% analysis speed regression before we move forward with this 🤔

MathiasVP · 2024-03-21T17:37:42Z

Hm, the use of value numbering here is biting us, unfortunately. On main, when running ensuresEq on Kamailio we get:

Evaluated relational algebra for predicate IRGuards::GuardCondition.ensuresEq/5#dispred#42506afe@628ffbju with tuple counts:
  10627332   ~0%    {6} r1 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromBinaryLogicalOpera__#shared` WITH `IRGuards::GuardCondition.comparesEq/5#dispred#f67bf973_051234#join_rhs` ON FIRST 2 OUTPUT Lhs.0, Rhs.2, Rhs.3, Rhs.4, Lhs.2, Rhs.5
                
  8659784   ~0%    {6} r2 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromIR#2c83f350#shared` WITH `IRGuards::IRGuardCondition.comparesEq/5#dispred#8751d3be_051234#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.2, Lhs.3, Rhs.3, Rhs.4, Rhs.5
  8659784   ~2%    {6}    | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
  8657912   ~0%    {6}    | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Lhs.5, Rhs.1
  8657912   ~0%    {6}    | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
  8656040   ~3%    {6}    | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.1, Lhs.5, Rhs.1, Lhs.3, Lhs.2, Lhs.4
                
  19283372   ~5%    {6} r3 = r1 UNION r2
                    return r3

whereas on this PR we get:

[2024-03-21 15:02:50] Evaluated non-recursive predicate IRGuards::GuardCondition.ensuresEq/5#dispred#42506afe@1cfa5anp in 228413ms (size: 264371532).
Evaluated relational algebra for predicate IRGuards::GuardCondition.ensuresEq/5#dispred#42506afe@1cfa5anp with tuple counts:
  129856584   ~4%    {6} r1 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromBinaryLogicalOpera__#shared` WITH `IRGuards::GuardCondition.comparesEq/5#dispred#f67bf973_051234#join_rhs` ON FIRST 2 OUTPUT Lhs.0, Rhs.2, Rhs.3, Rhs.4, Lhs.2, Rhs.5
                  
  134518866   ~0%    {6} r2 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromIR#2c83f350#shared` WITH `IRGuards::IRGuardCondition.comparesEq/5#dispred#8751d3be_051234#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.2, Lhs.3, Rhs.3, Rhs.4, Rhs.5
  134518866   ~0%    {6}    | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
  134516907   ~0%    {6}    | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Lhs.5, Rhs.1
  134516907   ~5%    {6}    | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
  134514948   ~0%    {6}    | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.1, Lhs.5, Rhs.1, Lhs.3, Lhs.2, Lhs.4
                  
  264371532   ~0%    {6} r3 = r1 UNION r2
                      return r3

i.e., the pipeline grew by a factor of ~13 😱

To see where these come from, I looked at the DIL for computing r1 which corresponds roughly to (after removing irrelevant columns):

int histogram(GuardConditionFromBinaryLogicalOperator gc) {
  result =
    count(BasicBlock block, Expr e |
      exists(AbstractValue value |
        gc.valueControls(block, value) and
        gc.comparesEq(e, _, _, value)
      )
    )
}

the largest rows on main is 2598, whereas on this PR there are many rows with >25000 results. In particular, the guard here is now (incorrectly) inferred to provide an equality for any use of ket->ptypes[0] that is guarded by an value-number equivalent guard condition. So to fix this we'd need to put some sort of dominance criteria on the e above, and at this point I think this is getting more complex than I was aiming for 😭 So we should probably just leave this as-is until we start working on a properly shared guards library.

Oh well

bdrodes · 2024-11-04T18:24:40Z

I investigated this a bit. I decided to try a different approach. Rather than using GVN for every step in the conditional parsing, I specifically only get GVN if the 'next' instruction differs. The intuition is this would/should only give me the current instruction or any defining instruction (since the definition should be the only place where the GVN instruction differs). I.e., I basically hacked around GVN to give me the SSA def/use. SSA seems like the preferred choice honestly, so I'm open to suggestions there, but I'll make a separate PR for that discussion.

Using your histogram @MathiasVP I'm seeing 81 rows with counts > 2000, and the highest is 16000. There is only one result > 6400.

This is still higher than main according to @MathiasVP's original findings, but I have a suspicion it will provide much greater performance.

I'll work up a PR for this separately for further testing/comments.

bdrodes · 2024-11-04T19:35:33Z

@MathiasVP , I've created a draft PR here: https://github.com/github/codeql/pull/17907/files

One thing to note, I did not see any difference in the range analysis tests, like you had in this PR. Is that an error? We can take further discussion over to this PR.

MathiasVP · 2024-11-06T09:03:14Z

Yeah, it likely means something isn't working over in #17907. Let's continue over there 🏃‍♂️

MathiasVP added 7 commits March 21, 2024 10:32

C++: Add FP test.

fbb1769

C++: These are the only places that actually depend on the various

52307ea

subclasses of value numbers added in the previous commit.

C++: Also handle the places where an instruction is negated.

18ab308

C++: Replace 'Instruction's with 'ValueNumber's.

0026b35

C++: And pass the value number when calling the implementation predic…

b952ea8

…ates.

C++: Accept test changes.

9f223f8

MathiasVP requested a review from a team as a code owner March 21, 2024 10:40

github-actions bot added the C++ label Mar 21, 2024

MathiasVP added the no-change-note-required This PR does not need a change note label Mar 21, 2024

jketema reviewed Mar 21, 2024

View reviewed changes

MathiasVP added 3 commits March 21, 2024 12:13

C++: Avoid conflating left and right between operations.

d67dae8

C++: Fix copy-paste mistake.

2252936

C++: Accept test changes.

2b5d739

geoffw0 reviewed Mar 21, 2024

View reviewed changes

cpp/ql/lib/semmle/code/cpp/controlflow/IRGuards.qll Show resolved Hide resolved

jketema approved these changes Mar 21, 2024

View reviewed changes

MathiasVP closed this Mar 21, 2024

MathiasVP mentioned this pull request Nov 8, 2024

C++: Fix some FPs in cpp/missing-check-scanf (second attempt) #17938

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++: Fix some FPs in `cpp/missing-check-scanf` #16009

C++: Fix some FPs in `cpp/missing-check-scanf` #16009

MathiasVP commented Mar 21, 2024

jketema Mar 21, 2024 •

edited

Loading

MathiasVP Mar 21, 2024

MathiasVP commented Mar 21, 2024

aschackmull commented Mar 21, 2024

MathiasVP commented Mar 21, 2024

MathiasVP commented Mar 21, 2024 •

edited

Loading

MathiasVP commented Mar 21, 2024

MathiasVP commented Mar 21, 2024

bdrodes commented Nov 4, 2024

bdrodes commented Nov 4, 2024

MathiasVP commented Nov 6, 2024

C++: Fix some FPs in cpp/missing-check-scanf #16009

C++: Fix some FPs in cpp/missing-check-scanf #16009

Conversation

MathiasVP commented Mar 21, 2024

jketema Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

MathiasVP Mar 21, 2024

Choose a reason for hiding this comment

MathiasVP commented Mar 21, 2024

aschackmull commented Mar 21, 2024

MathiasVP commented Mar 21, 2024

MathiasVP commented Mar 21, 2024 • edited Loading

MathiasVP commented Mar 21, 2024

MathiasVP commented Mar 21, 2024

bdrodes commented Nov 4, 2024

bdrodes commented Nov 4, 2024

MathiasVP commented Nov 6, 2024

C++: Fix some FPs in `cpp/missing-check-scanf` #16009

C++: Fix some FPs in `cpp/missing-check-scanf` #16009

jketema Mar 21, 2024 •

edited

Loading

MathiasVP commented Mar 21, 2024 •

edited

Loading