-
Notifications
You must be signed in to change notification settings - Fork 3.8k
/
Copy pathjoin.opt
402 lines (388 loc) · 13.9 KB
/
join.opt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
# =============================================================================
# join.opt contains exploration rules for the Join operator.
# =============================================================================
# ReorderJoins matches the first expression of a join group and adds to the memo
# all valid join orderings that do not introduce cross joins. If the join has
# join hints or is the result of a previous join reordering, the join tree is
# not reordered. For more information, see the comment in join_order_builder.go.
#
# Citations: [8]
[ReorderJoins, Explore]
(InnerJoin | SemiJoin | AntiJoin | LeftJoin | FullJoin
* & (ShouldReorderJoins (Root))
)
=>
(ReorderJoins)
# CommuteLeftJoin creates a Join with the left and right inputs swapped.
# This is symmetric with the CommuteRightJoin normalization rule.
[CommuteLeftJoin, Explore]
(LeftJoin $left:* $right:* $on:* $private:*)
=>
(RightJoin $right $left $on (CommuteJoinFlags $private))
# CommuteSemiJoin generates an InnerJoin that is equivalent to the SemiJoin.
# SemiJoins impose a partial order on the joining tables. We can convert a
# SemiJoin into an InnerJoin by applying a DistinctOn operator on the selected
# rows of the RHS and then relaxing the partial join order restriction.
#
# This allows the join orders (A SemiJoin B) and (Distinct(B*) InnerJoin A) to be
# both considered by the optimizer. This is useful as a different join order may
# allow other rules to trigger. A common case is that, it would allow the inner
# join to use a lookup join in some cases (For example, a different join order
# would allow the use of a lookup join if A has much higher cardinality than B).
#
# We only do this when the On conditions guarantee that for each row in the LHS
# there is at most one unique matching row in the RHS. We need this because a
# SemiJoin emits a maximum of one row for every matching row in the LHS.
# This is an important difference between the behavior of a SemiJoin and an
# InnerJoin. For each row in the LHS, an InnerJoin emits a matching row for every
# row in the RHS where the conditions are met. For example consider the tables:
#
# lhs rhs
# +-----+ +------+
# 1 10
# 2 20
#
# If we do an InnerJoin on the table where the On condition is (lhs < rhs),
# you'll notice that each of the lhs rows are matched twice. And so the output
# of the InnerJoin would contain 2 rows for each matching row in the LHS.
# In order to guarantee that there is at most 1 matching row for every row in
# the LHS, we only commute a SemiJoin into an InnerJoin when the On conditions
# are only composed of equalities.
#
# Note: We only consider the columns of the RHS that are used in the On
# conditions (B* in the example above). And so we can be certain that
# ((Distinct(RHS*) InnerJoin LHS) will have at most 1 matching row for each row
# in the LHS if the On conditions are simple equalities.
#
# Citations: [7] (see section 2.1.1)
[CommuteSemiJoin, Explore]
(SemiJoin
$left:*
$right:*
$on:* & (IsSimpleEquality $on)
$private:* & (NoJoinHints $private)
)
=>
(Project
(InnerJoin
$left
(DistinctOn
$right
[]
(MakeGrouping
(IntersectionCols
(OutputCols $right)
(FilterOuterCols $on)
)
(EmptyOrdering)
)
)
$on
$private
)
[]
(OutputCols $left)
)
# ConvertSemiToInnerJoin applies in the cases where CommuteSemiJoin does not:
# when the ON condition is not a simple equality. In this case, we must perform
# a DisinctOn operation *after* the inner join, followed by a project to remove
# the right-side columns.
#
# Similar to CommuteSemiJoin, this rule allows semi joins to be commuted. This
# rule is also useful because it allows us to generate lookup joins and
# inverted lookup joins for cases where the index is not covering.
[ConvertSemiToInnerJoin, Explore]
(SemiJoin
$left:*
$right:*
$on:* & ^(IsSimpleEquality $on)
$private:* & (NoJoinHints $private)
)
=>
(Project
(DistinctOn
(InnerJoin
$newLeft:(EnsureKey $left)
$right
$on
$private
)
(MakeAggCols ConstAgg (NonKeyCols $newLeft))
(MakeGrouping (KeyCols $newLeft) (EmptyOrdering))
)
[]
(OutputCols $left)
)
# ConvertLeftToInnerJoin converts a left join to an inner join with the same
# ON condition, and then wraps the expression in another left join with the
# original left side. In order to avoid computing the left side of the join
# twice, we create a With expression for the left side, and then reference it
# with two WithScans. For example (assuming x is the primary key of a):
#
# SELECT a.x, b.y FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom);
#
# is converted to:
#
# WITH a_buf AS (
# SELECT * FROM a
# )
# SELECT a_buf.x, inr.y FROM a_buf LEFT JOIN (
# SELECT * FROM a_buf JOIN b ON ST_Intersects(a_buf.geom, b.geom)
# ) AS inr
# ON a_buf.x = inr.x;
#
# Note that this transformation is not desirable in the general case, but it
# is useful if there is a possibility of creating an inverted join (such as in
# the above example). For this reason, we only perform this transformation if
# CanGenerateInvertedJoin returns true.
#
# There is some added complexity due to the need to maintain the same output
# column IDs. Therefore, the WithScan in the outer left join must retain the
# original column IDs, but the binding expression of the With and the WithScan
# in the inner join must have different column IDs. See the ReplaceOutputCols
# and MakeWithScanUsingCols functions for details about how this works. Only
# the columns from the outer left side and the inner right side are ultimately
# projected.
#
# TODO(rytaft): This is a temporary solution to support index acceleration of
# geospatial left joins. Eventually we would like to support index acceleration
# of left joins directly by adding some extra bookkeeping to the execution
# operators. See #53576.
#
# TODO(rytaft): It is possible that this transformation may be desirable in
# more cases than just spatial queries. For example, it could enable the use
# of a lookup join with a non-covering index.
[ConvertLeftToInnerJoin, Explore]
(LeftJoin
$left:*
$right:*
$on:* & (CanGenerateInvertedJoin $right $on)
$private:* & (NoJoinHints $private)
)
=>
(With
$bindingExpr:(ReplaceOutputCols $newLeft:(EnsureKey $left))
(Project
(LeftJoin
$outerLeft:(MakeWithScanUsingCols
$id:(AddWithBinding $bindingExpr)
(OutputCols $newLeft)
)
(InnerJoin
$innerLeft:(MakeWithScan $id)
$right
(MapFilterCols
$on
(OutputCols $newLeft)
(OutputCols $innerLeft)
)
(EmptyJoinPrivate)
)
(MakeWithScanKeyEqualityFilters
$outerLeft
$innerLeft
)
$private
)
[]
(OutputCols2 $left $right)
)
(MakeWithPrivate $id)
)
# ConvertAntiToLeftJoin converts an anti join to a left join with the same
# ON condition, wraps the expression in a Select to remove rows that matched
# the ON condition, and then projects out the left side columns.
# For example (assuming x is a not-null column in b):
#
# SELECT * FROM a WHERE NOT EXISTS (
# SELECT * FROM b WHERE ST_Intersects(a.geom, b.geom)
# );
#
# is converted to:
#
# SELECT a.* FROM a LEFT JOIN b ON ST_Intersects(a.geom, b.geom)
# WHERE b.x IS NULL;
#
# Note that this transformation is not desirable in the general case, but it
# is useful if there is a possibility of creating an inverted join (such as in
# the above example). For this reason, we only perform this transformation if
# CanGenerateInvertedJoin returns true.
#
# TODO(rytaft): It is possible that this transformation may be desirable in
# more cases than just spatial queries. For example, it could enable the use
# of a lookup join with a non-covering index.
[ConvertAntiToLeftJoin, Explore]
(AntiJoin
$left:*
$right:*
$on:* & (CanGenerateInvertedJoin $right $on)
$private:* & (NoJoinHints $private)
)
=>
(Project
(Select
(LeftJoin
$left
$newRight:(EnsureNotNullColFromFilteredScan $right)
$on
$private
)
[
(FiltersItem
(Is
$variable:(Variable (NotNullCol $newRight))
(Null (TypeOf $variable))
)
)
]
)
[]
(OutputCols $left)
)
# GenerateMergeJoins creates MergeJoin operators for the join, using the
# interesting orderings property.
[GenerateMergeJoins, Explore]
(JoinNonApply $left:* $right:* $on:* $private:*)
=>
(GenerateMergeJoins (OpName) $left $right $on $private)
# GenerateLookupJoins creates LookupJoin operators for all indexes (of the Scan
# table) which allow it (including non-covering indexes). See the
# GenerateLookupJoins custom function for more details.
[GenerateLookupJoins, Explore]
(InnerJoin | LeftJoin | SemiJoin | AntiJoin
$left:*
(Scan $scanPrivate:*) & (IsCanonicalScan $scanPrivate)
$on:*
$private:*
)
=>
(GenerateLookupJoins (OpName) $left $scanPrivate $on $private)
# GenerateInvertedJoins creates InvertedJoin operators for all inverted
# indexes (of the Scan table) which allow it. See the GenerateInvertedJoins
# custom function for more details.
# TODO(rytaft): Add support for LeftJoin, SemiJoin, and AntiJoin. Currently
# these are supported by first converting them to InnerJoin using the rules
# ConvertSemiToInnerJoin, ConvertLeftToInnerJoin, and ConvertAntiToLeftJoin.
[GenerateInvertedJoins, Explore]
(InnerJoin
$left:*
(Scan $scanPrivate:*) &
(IsCanonicalScan $scanPrivate) &
(HasInvertedIndexes $scanPrivate)
$on:*
$private:*
)
=>
(GenerateInvertedJoins (OpName) $left $scanPrivate $on $private)
# GenerateInvertedJoinsFromSelect is similar to GenerateInvertedJoins, but
# applies when the input is a Select.
[GenerateInvertedJoinsFromSelect, Explore]
(InnerJoin
$left:*
(Select
(Scan $scanPrivate:*) &
(IsCanonicalScan $scanPrivate) &
(HasInvertedIndexes $scanPrivate)
$filters:*
)
$on:*
$private:*
)
=>
(GenerateInvertedJoins
(OpName)
$left
$scanPrivate
(ConcatFilters $on $filters)
$private
)
# GenerateZigzagJoins creates ZigzagJoin operators for all index pairs (of the
# Scan table) where the prefix column(s) of both indexes is/are fixed to
# constant values in the filters. See comments in GenerateZigzagJoin and
# distsqlrun/zigzagjoiner.go for more details on when a zigzag join can be
# planned.
#
# Zigzag joins are prohibited when the source Scan operator has been configured
# with a row-level locking mode. This is mostly out of convenience so that these
# row-level locking modes don't need to added to the ZigzagJoin operator. There
# doesn't seem to be a strong reason to support this, but if one comes up, it
# should be possible to lift this restriction.
[GenerateZigzagJoins, Explore]
(Select
(Scan $scan:*) & (IsCanonicalScan $scan) & ^(IsLocking $scan)
$filters:*
)
=>
(GenerateZigzagJoins $scan $filters)
# GenerateInvertedIndexZigzagJoins creates ZigzagJoin operators for inverted
# indexes that can be constrained with two or more distinct constant values.
# Inverted indexes contain one row for each path-to-leaf in a JSON value, so one
# row in the primary index could generate multiple inverted index keys. This
# property can be exploited by zigzag joining on the same inverted index, fixed
# at any two of the JSON paths we are querying for.
#
# Zigzag joins are prohibited when the source Scan operator has been configured
# with a row-level locking mode. This is mostly out of convenience so that these
# row-level locking modes don't need to added to the ZigzagJoin operator. There
# doesn't seem to be a strong reason to support this, but if one comes up, it
# should be possible to lift this restriction.
[GenerateInvertedIndexZigzagJoins, Explore]
(Select
(Scan $scan:*) &
(IsCanonicalScan $scan) &
^(IsLocking $scan) &
(HasInvertedIndexes $scan)
$filters:*
)
=>
(GenerateInvertedIndexZigzagJoins $scan $filters)
# GenerateLookupJoinWithFilter creates a LookupJoin alternative for a Join which
# has a Select->Scan combination as its right input. The filter can get merged
# with the ON condition (this is correct for inner, left, and semi/anti join).
[GenerateLookupJoinsWithFilter, Explore]
(InnerJoin | LeftJoin | SemiJoin | AntiJoin
$left:*
(Select
(Scan $scanPrivate:*) & (IsCanonicalScan $scanPrivate)
$filters:*
)
$on:*
$private:*
)
=>
(GenerateLookupJoins
(OpName)
$left
$scanPrivate
(ConcatFilters $on $filters)
$private
)
# PushJoinIntoIndexJoin pushes an InnerJoin into an IndexJoin. The IndexJoin is
# replaced with a LookupJoin, since it now must output columns from the right
# side of the InnerJoin as well as from the original lookup table. This can
# be useful when the InnerJoin that is pushed down reduces cardinality, since an
# index lookup can be expensive.
#
# Matching conditions:
# 1. The right input of the InnerJoin does not have outer columns.
# 2. The ON condition of the InnerJoin only references columns from its right
# input and the input of the IndexJoin.
# 3. The InnerJoin does not have any join hints.
#
# TODO(drewk): match LeftJoins as well if a good use case is found.
[PushJoinIntoIndexJoin, Explore]
(InnerJoin
$left:(IndexJoin $indexInput:* $indexPrivate:*)
$right:* & ^(HasOuterCols $right)
$on:* & (FiltersBoundBy $on (OutputCols2 $indexInput $right))
$joinPrivate:* & (NoJoinHints $joinPrivate)
)
=>
(LookupJoin
(InnerJoin $indexInput $right $on $joinPrivate)
[]
(ConvertIndexToLookupJoinPrivate
$indexPrivate
(OutputCols2 $left $right)
)
)