-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rust: Add (auto-generated) CFG node wrapper classes #17918
Conversation
ebe8882
to
3c4add9
Compare
3c4add9
to
bde5c16
Compare
bde5c16
to
88eed59
Compare
88eed59
to
da5d066
Compare
da5d066
to
780af65
Compare
|
||
mod if_expression { | ||
|
||
fn test_and_if_let(a: bool, b: Option<bool>, c: bool) -> bool { |
Check notice
Code scanning / CodeQL
Unused variable Note test
mod if_expression { | ||
|
||
fn test_and_if_let(a: bool, b: Option<bool>, c: bool) -> bool { | ||
if a && let Some(d) = b { |
Check notice
Code scanning / CodeQL
Unused variable Note test
} | ||
|
||
fn test_and_if_let2(a: bool, b: i64, c: bool) -> bool { | ||
if a && let d = b |
Check notice
Code scanning / CodeQL
Unused variable Note test
435a4df
to
96d0426
Compare
CfgNodes.qll
@@ -119,7 +119,7 @@ class _: | |||
""" | |||
|
|||
|
|||
@annotate(IfExpr) | |||
@annotate(IfExpr, cfg = True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if subtypes would "inherit" the cfg
property from their base class. That way you don't have to repeat it for every class (with a risk of forgetting some).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as one could override the annotation, that could work. For example, we don't want ParensExpr
in the CFG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty amazing to me. I've left a few comments and questions.
Do we write "control flow" with or without a "-"? I was under the impression that we write it without the "-" as that what I've seen the most. Some quick grepping also seem to indicate that the variant without the "-" occurs more often in the repo.
// In for example `if (a && true) ...` there is no RHS CFG node going into the | ||
// `[false] a && true` operation | ||
strictcount(ConditionalSuccessor cs | exists(last.getACfgNode().getASuccessor(cs)) | cs) = 1 | ||
or | ||
// In for example `x && return`, there is no RHS CFG node going into the `&&` operation | ||
not c instanceof CfgImpl::NormalCompletion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaks the comments a bit. The "no RHS CFG node" makes it sound as if its a node and not an edge that isn't there.
// In for example `if (a && true) ...` there is no RHS CFG node going into the | |
// `[false] a && true` operation | |
strictcount(ConditionalSuccessor cs | exists(last.getACfgNode().getASuccessor(cs)) | cs) = 1 | |
or | |
// In for example `x && return`, there is no RHS CFG node going into the `&&` operation | |
not c instanceof CfgImpl::NormalCompletion | |
// In for example `if (a && true) ...`, there is no edge from the CFG node | |
// for `true` into the `[false] a && true` node. | |
strictcount(ConditionalSuccessor cs | exists(last.getACfgNode().getASuccessor(cs)) | cs) = 1 | |
or | |
// In for example `x && return`, there is no edge from the node for | |
// `return` into the `&&` node. | |
not c instanceof CfgImpl::NormalCompletion |
|
||
query predicate missingCfgChild(CfgNode parent, string pred, int i, AstNode child) { | ||
CfgNodes::missingCfgChild(parent, pred, i, child) and | ||
successfullyExtractedFile(child.getLocation().getFile()) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why wee need this predicate for this consistency check specifically? For instance, I've seen dead ends in syntactically incorrect extracted Rust files. Should we instead do it for all consistency checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to do this for all (CFG) consistency checks.
/** | ||
* Gets the argument list of this call expression base, if it exists. | ||
*/ | ||
ArgList getArgList() { result = node.getArgList() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one has a CallExprBaseCfgNode
and do c.getArgList().getArg(n)
one gets a plain Expr
. That could be a footgun.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, but I don't think there is an easy/nice way to get rid of this auto-generated predicate. The best solution, I think, is to get rid of the intermediate ArgList
node altogether in the extractor.
96d0426
to
be7aca9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
I just stumbled upon this: |
Both predicates are auto-generated, so that would not be easy. One can always do |
When writing queries that use data flow or control flow, it is tempting to map entities down to AST nodes and back again. For example, if we have a CFG node for an addition expression, in order to get a CFG node for the LHS one might write:
However, the issue above is that we may loose information about control-flow-splitting, meaning we may mix the CFG node for the addition from one set of splits with the CFG node for the LHS for another set of splits.
To this end, it is convenient to have a layer on top of
CfgNode
, which lifts member predicates such asgetLhs
, and which is responsible for getting CFG splitting right:Then query writers that operate on CFG level or above, should never have to map down to the AST layer.
In this PR we add exactly this layer, and since we already auto-generate the AST layer based on a schema, we extend the auto-generation to also construct the CFG layer. However, only a subset of AST nodes makes sense to include in the CFG layer, so one has to explicitly annotate which classes to include (using the
cfg = True
annotation inannotations.py
).Some manual additions are also needed in the CFG layer, namely those AST classes that are themselves not auto-generated (such as the
AssignmentExpr
class), and some classes may need to have extra predicates added when their CFG children do not correspond directly to AST children (for example thegetArgument
predicate is added toCallExprBaseCfgNode
).