Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust: Add (auto-generated) CFG node wrapper classes #17918

Merged
merged 7 commits into from
Nov 21, 2024

Conversation

hvitved
Copy link
Contributor

@hvitved hvitved commented Nov 6, 2024

When writing queries that use data flow or control flow, it is tempting to map entities down to AST nodes and back again. For example, if we have a CFG node for an addition expression, in order to get a CFG node for the LHS one might write:

CfgNode getLhs(CfgNode add) {
  result.getAstNode() = add.getAstNode().getLhs()
}

However, the issue above is that we may loose information about control-flow-splitting, meaning we may mix the CFG node for the addition from one set of splits with the CFG node for the LHS for another set of splits.

To this end, it is convenient to have a layer on top of CfgNode, which lifts member predicates such as getLhs, and which is responsible for getting CFG splitting right:

class BinaryExprCfgNode extends ExprCfgNode {
  ExprCfgNode getLhs() { ... }
}

Then query writers that operate on CFG level or above, should never have to map down to the AST layer.

In this PR we add exactly this layer, and since we already auto-generate the AST layer based on a schema, we extend the auto-generation to also construct the CFG layer. However, only a subset of AST nodes makes sense to include in the CFG layer, so one has to explicitly annotate which classes to include (using the cfg = True annotation in annotations.py).

Some manual additions are also needed in the CFG layer, namely those AST classes that are themselves not auto-generated (such as the AssignmentExpr class), and some classes may need to have extra predicates added when their CFG children do not correspond directly to AST children (for example the getArgument predicate is added to CallExprBaseCfgNode).

@github-actions github-actions bot added the Rust Pull requests that update Rust code label Nov 6, 2024
@hvitved hvitved force-pushed the rust/cfg-codegen branch 4 times, most recently from ebe8882 to 3c4add9 Compare November 7, 2024 13:17

mod if_expression {

fn test_and_if_let(a: bool, b: Option<bool>, c: bool) -> bool {

Check notice

Code scanning / CodeQL

Unused variable Note test

Variable 'c' is not used.
mod if_expression {

fn test_and_if_let(a: bool, b: Option<bool>, c: bool) -> bool {
if a && let Some(d) = b {

Check notice

Code scanning / CodeQL

Unused variable Note test

Variable 'd' is not used.
}

fn test_and_if_let2(a: bool, b: i64, c: bool) -> bool {
if a && let d = b

Check notice

Code scanning / CodeQL

Unused variable Note test

Variable 'd' is not used.
@hvitved hvitved force-pushed the rust/cfg-codegen branch 3 times, most recently from 435a4df to 96d0426 Compare November 19, 2024 15:16
@hvitved hvitved changed the title Rust: Auto-generate CfgNodes.qll Rust: Add (auto-generated) CFG node wrapper classes Nov 20, 2024
@hvitved hvitved marked this pull request as ready for review November 20, 2024 11:45
@hvitved hvitved requested a review from a team as a code owner November 20, 2024 11:45
@@ -119,7 +119,7 @@ class _:
"""


@annotate(IfExpr)
@annotate(IfExpr, cfg = True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if subtypes would "inherit" the cfg property from their base class. That way you don't have to repeat it for every class (with a risk of forgetting some).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as one could override the annotation, that could work. For example, we don't want ParensExpr in the CFG.

Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty amazing to me. I've left a few comments and questions.

Do we write "control flow" with or without a "-"? I was under the impression that we write it without the "-" as that what I've seen the most. Some quick grepping also seem to indicate that the variant without the "-" occurs more often in the repo.

Comment on lines 68 to 74
// In for example `if (a && true) ...` there is no RHS CFG node going into the
// `[false] a && true` operation
strictcount(ConditionalSuccessor cs | exists(last.getACfgNode().getASuccessor(cs)) | cs) = 1
or
// In for example `x && return`, there is no RHS CFG node going into the `&&` operation
not c instanceof CfgImpl::NormalCompletion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaks the comments a bit. The "no RHS CFG node" makes it sound as if its a node and not an edge that isn't there.

Suggested change
// In for example `if (a && true) ...` there is no RHS CFG node going into the
// `[false] a && true` operation
strictcount(ConditionalSuccessor cs | exists(last.getACfgNode().getASuccessor(cs)) | cs) = 1
or
// In for example `x && return`, there is no RHS CFG node going into the `&&` operation
not c instanceof CfgImpl::NormalCompletion
// In for example `if (a && true) ...`, there is no edge from the CFG node
// for `true` into the `[false] a && true` node.
strictcount(ConditionalSuccessor cs | exists(last.getACfgNode().getASuccessor(cs)) | cs) = 1
or
// In for example `x && return`, there is no edge from the node for
// `return` into the `&&` node.
not c instanceof CfgImpl::NormalCompletion


query predicate missingCfgChild(CfgNode parent, string pred, int i, AstNode child) {
CfgNodes::missingCfgChild(parent, pred, i, child) and
successfullyExtractedFile(child.getLocation().getFile()) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why wee need this predicate for this consistency check specifically? For instance, I've seen dead ends in syntactically incorrect extracted Rust files. Should we instead do it for all consistency checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to do this for all (CFG) consistency checks.

/**
* Gets the argument list of this call expression base, if it exists.
*/
ArgList getArgList() { result = node.getArgList() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one has a CallExprBaseCfgNode and do c.getArgList().getArg(n) one gets a plain Expr. That could be a footgun.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but I don't think there is an easy/nice way to get rid of this auto-generated predicate. The best solution, I think, is to get rid of the intermediate ArgList node altogether in the extractor.

misc/codegen/codegen.py Outdated Show resolved Hide resolved
Copy link
Contributor

@paldepind paldepind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@hvitved hvitved merged commit 295626d into github:main Nov 21, 2024
24 checks passed
@hvitved hvitved deleted the rust/cfg-codegen branch November 21, 2024 10:20
@paldepind
Copy link
Contributor

I just stumbled upon this: CallExprCfgNode has a getExpr that is different from the getExpr on ExprCfgNode. It is a bit confusing that these member predicate names overlap. Perhaps it would make sense to name the former asExpr or asAstExpr or getAstExpr?

@hvitved
Copy link
Contributor Author

hvitved commented Nov 21, 2024

I just stumbled upon this: CallExprCfgNode has a getExpr that is different from the getExpr on ExprCfgNode. It is a bit confusing that these member predicate names overlap. Perhaps it would make sense to name the former asExpr or asAstExpr or getAstExpr?

Both predicates are auto-generated, so that would not be easy. One can always do call.(ExprCfgNode).getExpr() to invoke the base getExpr predicate if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Rust Pull requests that update Rust code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants