Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFlow: Support stateless isSink in StateConfigSigs #13851

Merged
merged 8 commits into from
Aug 4, 2023

Conversation

MathiasVP
Copy link
Contributor

@MathiasVP MathiasVP commented Jul 31, 2023

Sometimes it's necessary to have a state-based configuration to define the correct isBarrier, but if data then does manage to reach a sink, any state should be accepted. Prior to this PR, the only way to prevent a cartesian product would be to do something like:

module PruningConfig implements ConfigSig {
  predicate isSource(Node source) {
    exists(MyState state | isSourceImpl(source, state))
  }

  predicate isSink(Node sink) { ... }
}

module PruningFlow = Global<PruningConfig>;

FlowState viableStateForSink(Node sink) {
  exists(PruningFlow::PathNode pSource, PruningFlow::PathNode pSink |
    PruningFlow::flowPath(pSource, pSink) and
    pSink.getNode() = sink and
    isSourceImpl(pSource.getNode(), result)
  )
}

module RealConfig implements StateConfigSig {
  class FlowState = MyState

  predicate isSource(Node source, FlowState state) { isSourceImpl(source, state) }

  predicate isSink(Node sink, FlowState state) {
    ... and state = viableStateForSink(sink) // <-- to prevent CP with all flow states.
  }

  predicate isBarrier(Node barrier, FlowState state) { ... }
}

because there was no isSink/1 on StateConfigSig. With this PR we can now do:

module RealConfig implements StateConfigSig {
  predicate isSource(Node source, FlowState state) { ... }

  predicate isSink(Node sink) { ... }

  predicate isBarrier(Node barrier, FlowState state) { ... }
}

with no PruningFlow mess.

cc @aschackmull I hope this isn't too controversial?

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to need some changes.

@MathiasVP
Copy link
Contributor Author

@aschackmull I've rebased the PR now that #13863 has been merged (🎉).

@MathiasVP MathiasVP marked this pull request as ready for review August 2, 2023 12:41
@MathiasVP MathiasVP requested review from a team as code owners August 2, 2023 12:41
@MathiasVP MathiasVP requested review from a team as code owners August 2, 2023 12:41
@MathiasVP
Copy link
Contributor Author

MathiasVP commented Aug 2, 2023

Hmmm @aschackmull I think adding the additional conjuncts to sinkNode breaks in the use of sinkNode in the partial flow case 😭. I've implemented one possible fix in 50f5c4d.

@@ -3981,7 +4003,7 @@ module MakeImpl<DataFlowParameter Lang> {

private predicate relevantState(FlowState state) {
sourceNode(_, state) or
sinkNode(_, state) or
sinkNodeWithState(_, state) or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make another predicate to do a best-effort cartesian approximation for the reverse flow exploration case - this doesn't have to be extremely performant as it's just a debugging tool that'll often be restricted to specific sources/sinks anyway.
Add the following just below the relevantState predicate:

private predicate revSinkNode(NodeEx node, FlowState state) {
  sinkNodeWithState(node, state)
  or
  Config::isSink(node.asNode()) and
  relevantState(state) and
  not fullBarrier(node) and
  not stateBarrier(node, state)
}

and use it in the two places below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks! Fixed in 981f675.

Copy link
Contributor

@aschackmull aschackmull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now.

@MathiasVP
Copy link
Contributor Author

Hm, it looks like Java's DCA run isn't super happy about these changes. I'll investigate!

@aschackmull
Copy link
Contributor

Hm, it looks like Java's DCA run isn't super happy about these changes. I'll investigate!

It might very well be unrelated to this PR - I think we're seeing a lot of OOM kills in DCA at the moment for other reasons.

@MathiasVP
Copy link
Contributor Author

Hm, it looks like Java's DCA run isn't super happy about these changes. I'll investigate!

It might very well be unrelated to this PR - I think we're seeing a lot of OOM kills in DCA at the moment for other reasons.

Ah, thanks for the heads up. Stage timings also seem to blame a bunch of non-dataflow related queries so it's probably a fluke. In any case, I've started a separate run for Java, and I'll do a couple of local evaluations to make sure there's nothing wrong

@MathiasVP
Copy link
Contributor Author

FWIW, none of the projects that reported a slowdown on DCA seems to be slowing down locally.

@MathiasVP
Copy link
Contributor Author

Java has a bunch of OOMs, but this PR doesn't seem to contribute to any more of those OOMs. And since Java (nor any other language) isn't actually using this new feature yet this doesn't seem like it should block this PR. I've also verified that no bad joins are introduced.

@MathiasVP MathiasVP merged commit abe3a81 into github:main Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants