-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Fix some FPs in cpp/missing-check-scanf
#16009
Conversation
of the implementation predicates in 'IRGuards' from an 'Instruction' to a value number. This means it doesn't matter whether the expression being branched on is a comparison, or a use of a variable whose value is the result of a comparison. This first commit adds the various subclasses of value number that we need the rest purely mechanical changes that replaces 'Instruction's with 'ValueNumber's in the relevant places.
subclasses of value numbers added in the previous commit.
@@ -872,15 +872,15 @@ private predicate simple_comparison_eq( | |||
} | |||
|
|||
private predicate complex_eq( | |||
CompareInstruction cmp, Operand left, Operand right, int k, boolean areEqual, AbstractValue value | |||
ValueNumber cmp, Operand left, Operand right, int k, boolean areEqual, AbstractValue value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason for not using the newly introduced CompareValueNumber
here? And also in some other places in the commit that introduces this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually does matter because of the LogicalNotInstruction
case:
exists(AbstractValue dual, LogicalNotInstruction logicalNot |
value = dual.getDualValue() and
logicalNot = test.getAnInstruction() and
compares_eq(valueNumber(logicalNot.getUnary()), left, right, k, areEqual, dual)
)
here we don't require the value number to be a CompareValueNumber
, but instead we require it to be something of the form !x
, where the x
is then used recursively.
I'll see if I can come up with a testcase where this is important 🤔
I see some unexpected library test changes. I'll investigate them! |
We really should get a shared Guards library. Anyway, just wanted to note that the Java Guards library already supports guards derived from assignments to boolean variables, but the approach is somewhat different, and, I think, more extensible. The Java version has the GuardsLogic library, which extends guards by making use of all sorts of implications that can be derived between conditions. Ideally all of this would be shared code... |
Yeah, I agree. The C/C++ library has a very naive and ad-hoc approach to some of this. |
These are now fixed by d67dae8 and 2252936. I'll rerun DCA. Even though Schack described how it would be much better to have a shared guards library I still think this PR is worth it since it fixes some actual FPs in queries, and a shared guards library is still a (Glorious) Future item that no one is currently working on right now. |
Hm. I want to check Kamailio's 9% analysis speed regression before we move forward with this 🤔 |
Hm, the use of value numbering here is biting us, unfortunately. On main, when running Evaluated relational algebra for predicate IRGuards::GuardCondition.ensuresEq/5#dispred#42506afe@628ffbju with tuple counts:
10627332 ~0% {6} r1 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromBinaryLogicalOpera__#shared` WITH `IRGuards::GuardCondition.comparesEq/5#dispred#f67bf973_051234#join_rhs` ON FIRST 2 OUTPUT Lhs.0, Rhs.2, Rhs.3, Rhs.4, Lhs.2, Rhs.5
8659784 ~0% {6} r2 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromIR#2c83f350#shared` WITH `IRGuards::IRGuardCondition.comparesEq/5#dispred#8751d3be_051234#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.2, Lhs.3, Rhs.3, Rhs.4, Rhs.5
8659784 ~2% {6} | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
8657912 ~0% {6} | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Lhs.5, Rhs.1
8657912 ~0% {6} | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
8656040 ~3% {6} | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.1, Lhs.5, Rhs.1, Lhs.3, Lhs.2, Lhs.4
19283372 ~5% {6} r3 = r1 UNION r2
return r3 whereas on this PR we get: [2024-03-21 15:02:50] Evaluated non-recursive predicate IRGuards::GuardCondition.ensuresEq/5#dispred#42506afe@1cfa5anp in 228413ms (size: 264371532).
Evaluated relational algebra for predicate IRGuards::GuardCondition.ensuresEq/5#dispred#42506afe@1cfa5anp with tuple counts:
129856584 ~4% {6} r1 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromBinaryLogicalOpera__#shared` WITH `IRGuards::GuardCondition.comparesEq/5#dispred#f67bf973_051234#join_rhs` ON FIRST 2 OUTPUT Lhs.0, Rhs.2, Rhs.3, Rhs.4, Lhs.2, Rhs.5
134518866 ~0% {6} r2 = JOIN `_IRGuards::GuardCondition.controls/2#dispred#b7ca92ee_IRGuards::GuardConditionFromIR#2c83f350#shared` WITH `IRGuards::IRGuardCondition.comparesEq/5#dispred#8751d3be_051234#join_rhs` ON FIRST 2 OUTPUT Rhs.2, Lhs.2, Lhs.3, Rhs.3, Rhs.4, Rhs.5
134518866 ~0% {6} | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
134516907 ~0% {6} | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.3, Lhs.1, Lhs.2, Lhs.4, Lhs.5, Rhs.1
134516907 ~5% {6} | JOIN WITH `Instruction::Instruction.getAUse/0#dispred#3072f0dc_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4, Lhs.5
134514948 ~0% {6} | JOIN WITH `Instruction::Instruction.getUnconvertedResultExpression/0#dispred#d46d1df3` ON FIRST 1 OUTPUT Lhs.1, Lhs.5, Rhs.1, Lhs.3, Lhs.2, Lhs.4
264371532 ~0% {6} r3 = r1 UNION r2
return r3 i.e., the pipeline grew by a factor of ~13 😱 To see where these come from, I looked at the DIL for computing int histogram(GuardConditionFromBinaryLogicalOperator gc) {
result =
count(BasicBlock block, Expr e |
exists(AbstractValue value |
gc.valueControls(block, value) and
gc.comparesEq(e, _, _, value)
)
)
} the largest rows on Oh well |
I investigated this a bit. I decided to try a different approach. Rather than using GVN for every step in the conditional parsing, I specifically only get GVN if the 'next' instruction differs. The intuition is this would/should only give me the current instruction or any defining instruction (since the definition should be the only place where the GVN instruction differs). I.e., I basically hacked around GVN to give me the SSA def/use. SSA seems like the preferred choice honestly, so I'm open to suggestions there, but I'll make a separate PR for that discussion. Using your histogram @MathiasVP I'm seeing 81 rows with counts > 2000, and the highest is 16000. There is only one result > 6400. This is still higher than main according to @MathiasVP's original findings, but I have a suspicion it will provide much greater performance. I'll work up a PR for this separately for further testing/comments. |
@MathiasVP , I've created a draft PR here: https://github.com/github/codeql/pull/17907/files One thing to note, I did not see any difference in the range analysis tests, like you had in this PR. Is that an error? We can take further discussion over to this PR. |
Yeah, it likely means something isn't working over in #17907. Let's continue over there 🏃♂️ |
This PR fixes a simple class of FPs in
cpp/missing-check-scanf
by improving the guards library. Previously, the guards library understood things such as:but we failed to infer this equality in:
because the guards library hardcoded that the comparison should be the actual thing being branched on. This PR fixes this by using value numbers in the internal guard logic predicates. And since
r
anda == b
has the same value number these two cases are now identical 🎉Annoyingly, some FPs still appear because of the awkward conversions between
IRBlock
s andBasicBlock
s done. This FPs should ideally be fixed by completely getting rid ofBasicBlock
s, but that's a job for another PR.For now, I'm happy that a simple improvement to the guards library fixes the simple cases automagically 🎉
Commit-by-commit review recommended.