-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Property pruner #3750
Property pruner #3750
Conversation
Any design document about this? |
Sorry, there is no design doc here. Bug after the PR is finished, I will make a brief description. |
3f2257b
to
bddd2a9
Compare
@@ -11,6 +11,8 @@ | |||
#include "graph/planner/plan/Query.h" | |||
#include "graph/util/ExpressionUtils.h" | |||
|
|||
DEFINE_bool(enable_opt_collapse_project_rule, true, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why disable this rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not disabled, it's just a switch. Actually, all opt rules need a switch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RBO should not be designed to be a tradeoff. Why need a switch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For test or debug.
- RBO rule doesn't always make the plan better.
src/graph/util/ExpressionUtils.cpp
Outdated
Status ExpressionUtils::extractPropsFromExprs(const Expression *expr, | ||
PropertyTracker &propsUsed, | ||
const graph::QueryContext *qctx, | ||
GraphSpaceID spaceID, | ||
const std::string &entityAlias) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define a more generic interface for utility functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. Could you have a explain?
src/common/expression/Expression.h
Outdated
@@ -223,6 +225,17 @@ class Expression { | |||
|
|||
std::ostream& operator<<(std::ostream& os, Expression::Kind kind); | |||
|
|||
struct PropertyTracker { | |||
std::unordered_map<std::string, std::unordered_map<TagID, std::unordered_set<std::string>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use TagID rather than TagName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the plan node such as AppendVertices
or Traverse
's data membervertexprops_
store the tagId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can materialize TagID
before you pass it to Traverse
or AppendVertices
instead of doing it in a visitor, which is debug friendly as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both are ok, other visitors like DeducePropsVisitor
has used tagId
:
void DeducePropsVisitor::visit(LabelTagPropertyExpression *expr) {
auto status = qctx_->schemaMng()->toTagID(space_, expr->sym());
39703ff
to
c116258
Compare
2acd783
to
f898eeb
Compare
propsUsed_.colsSet.emplace(colName); | ||
} | ||
|
||
// void PropertyTrackerVisitor::visit(AttributeExpression *expr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why disable AttributeExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the TODO of the desc of this PR:
At present, this PR only prunes vertex attributes, when the PR was almost reviewed, I can add the pruner of edge attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The others in this pr are all looks good to me. Please move on it. Look forward to the effect of this pr.
src/graph/optimizer/Optimizer.cpp
Outdated
auto rootGroup = std::move(status).value(); | ||
auto spaceID = qctx->rctx()->session()->space().id; | ||
|
||
// auto status = preprocess(root, qctx, spaceID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should figure out the relation between prepare
and preprocess
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think preprocess
is operated on a raw plan, while prepare
which invoked convertToGroup
is to convert the raw plan to a group, which is used in the exploration phase.
The 3 phase of optimizer of TIDB:
// Phase 1: Preprocessing
// The target of this phase is to preprocess the plan tree by some heuristic
// rules which should always be beneficial, for example Column Pruning.
//------------------------------------------------------------------------------
// Phase 2: Exploration
//------------------------------------------------------------------------------
//
// The target of this phase is to explore all the logically equivalent
// expressions by exploring all the equivalent group expressions of each group.
//------------------------------------------------------------------------------
// Phase 3: Implementation
//------------------------------------------------------------------------------
//
// The target of this phase is to search the best physical plan for a Group
// which satisfies a certain required physical property.
src/common/expression/Expression.h
Outdated
@@ -223,6 +224,17 @@ class Expression { | |||
|
|||
std::ostream& operator<<(std::ostream& os, Expression::Kind kind); | |||
|
|||
struct PropertyTracker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class should not be part of this file i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will find a better place to put it into.
src/graph/planner/plan/PlanNode.cpp
Outdated
@@ -379,6 +379,22 @@ void PlanNode::updateSymbols() { | |||
} | |||
} | |||
|
|||
Status PlanNode::pruneProperties(PropertyTracker& propsUsed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could do this by using the visitor pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. Putting the method pruneProperties
to the Plan node is a little intrusive. I will use the visitor pattern.
@@ -1,3 +1,4 @@ | |||
@wang |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will delete it.
# Below scenario is not suppoted for the execution plan has a scan. | ||
When executing query: | ||
When profiling query: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to add a execution plan check here
@@ -26,7 +27,23 @@ Feature: Multi Query Parts | |||
| "Tim Duncan" | "Boris Diaw" | "Spurs" | | |||
| "Tim Duncan" | "Boris Diaw" | "Suns" | | |||
| "Tim Duncan" | "Boris Diaw" | "Tim Duncan" | | |||
When executing query: | |||
# And the execution plan should be: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why comment these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is something wrong in the check mechanism of operator info of tck currently.
Ok, fine! |
3d2c42c
to
b573dcb
Compare
dcb0ab1
to
679ff67
Compare
3da2537
to
dbc07ba
Compare
dbc07ba
to
1e936ea
Compare
@@ -20,6 +21,8 @@ using nebula::graph::QueryContext; | |||
using nebula::graph::Select; | |||
using nebula::graph::SingleDependencyNode; | |||
|
|||
DEFINE_bool(enable_optimizer_property_pruner_rule, true, ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should reuse enable_optimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could enable_optimizer
be able to control whether to enable a single rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we don't provide a way to control single optimization. And in my mind, enable_optimizer
must control all optimizations include this of course.
* property pruner * postprocess works. * add Travese::pruneProperties * make vertexprops of travese be null when node alias is not used * fix extractPropsFromExps for tagPropExpr and edgePropexpr, vFilter and eFilter * add PropertyTracker::update for project node * add DeduceMatchPropsVisitor and FLAGS_enable_opt_collapse_project_rule * add Filter::pruneProperties and only 3 tck cases not passed * Do not do propsUsed.colsSet.erase(it) * revert labeltagpropexpr and pass all tck * add hasAlias, add deduce types for id, src, dst func * rename visitor * ingore func id, src, dst, type, typeid, rank, hash in property tracker visitor * add Aggregate::pruneProperties and store the alias of id/src/dst... to propsused * format tck * remove some unusedless headers * add tck for plan * rename has1, has2, has3 * add flag_enable_optimizer_property_pruner_rule and support prune edge props... * add kUnknownEdgeType * add PrunePropertiesVisitor * remove PlanNode::pruneProperties, and move PropertyTracker to PropertyTrackerVisitor * add PrunePropertiesRule.feature * add markDeleted interface * remove unused code * fix header macro * update tck Co-authored-by: kyle.cao <[email protected]>
What type of PR is this?
What problem(s) does this PR solve?
Issue(s) number:
Close #3096.
Description:
How do you solve it?
Collect the attributes used by each plan node from the top down and record them in the PropertyTracker. For a node (AppendVertices, Traverse), they only need to request the storaged attributes recorded in the PropertyTracker (that is, the attributes used by its upper nodes).
Afte property pruner, some plan nodes may need to be deleted.
eg. AppendVertices node could be deleted if all the vertexProps it requests are not used by the upper nodes.
PropertyTrackerVisitor
is used to track the vertex/edge props used by the expressions of plan nodes.Currently, we need to care about
TagPropertyExpression
andLabelTagProertyExpression
for vertex properties,EdgePropertyExpression
andAttributeExpression?
for edge properties. If these exprs are not found, thenInputPropertyExpresson
andVariablePropertyExpression
are collected.TODO:
markTheNodeAsDeleted
, which implies the node is useless after the property pruner and should be deleted.Special notes for your reviewer, ex. impact of this fix, design document, etc: