Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: wrong sorting after statistics init #33354

Closed
datobatone opened this issue Dec 25, 2018 · 19 comments · Fixed by #33536
Closed

sql: wrong sorting after statistics init #33354

datobatone opened this issue Dec 25, 2018 · 19 comments · Fixed by #33536
Assignees
Labels
A-sql-execution Relating to SQL execution. C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community

Comments

@datobatone
Copy link

datobatone commented Dec 25, 2018

before statistics init:

  1. explain SELECT *
    FROM "requests" JOIN "organizations" ON (organizations.id = requests.organization_id)
    JOIN "places" AS "load_place_id_places" ON (load_place_id_places.id = requests.load_place_id)
    ORDER BY "requests"."name" desc
sort		
 │	order	-name
 └── join		
      │	type	inner
      │	equality	(load_place_id) = (id)
      ├── join		
      │    │	type	inner
      │    │	equality	(organization_id) = (id)
      │    ├── scan		
      │    │	table	requests@primary
      │    │	spans	ALL
      │    └── scan		
      │	table	organizations@primary
      │	spans	ALL
      └── scan		
	table	places@primary
	spans	ALL
  1. Statistics init:
    CREATE STATISTICS places_stats ON id FROM places;
    table places has about 300k entries.

  2. explain again:

lookup-join		
 │	type	inner
 ├── sort		
 │    │	order	-name
 │    └── join		
 │         │	type	inner
 │         │	equality	(organization_id) = (id)
 │         ├── scan		
 │         │	table	requests@primary
 │         │	spans	ALL
 │         └── scan		
 │	table	organizations@primary
 │	spans	ALL
 └── scan		
	table	places@primary

So after create stats we have wrong sorting.

Environment:

  • CockroachDB version [ 2.1.1 and 2.1.3]
  • Server OS: [Ubuntu server 18]
  • Client app [DBeaver and golang]

image
image

@dt
Copy link
Member

dt commented Dec 26, 2018

Thanks for the detailed report @datobatone! cc @RaduBerinde for triage

@RaduBerinde
Copy link
Member

@datobatone - the execution plan looks ok to me. Have you seen out of order results? Can you post the results of EXPLAIN (VERBOSE) and EXPLAIN (DISTSQL)?

@knz knz changed the title wrong sorting after statistics init sql: wrong sorting after statistics init Dec 26, 2018
@knz knz added C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community A-sql-execution Relating to SQL execution. labels Dec 26, 2018
@datobatone
Copy link
Author

datobatone commented Dec 26, 2018

first sort, and then join. As a result, the data is scattered

EXPLAIN (VERBOSE)

lookup-join			(id, status, organization_id, user_id, load_place_id, load_date, loading_required, deliver_place_id, deliver_date, discharging_required, cargo_weight, cargo_volume, cargo_cost, cargo_cost_currency, cargo_ldm, cargo_parcels, cargo_batch_identifier, cargo_notes, transport_type_id, transport_subtype_id, transport_quantity, created_on, updated_on, canceled_on, need_container, container_number, organization_unit_id, assignee_user_id, name, id, type_id, role_type_id, status_type_id, name, inn, ogrn, kpp, country, region, city, address, natural_address, contacts, created_on, rating, schedule, tms_rating, bank_name, bank_account, bic, correspondent_account, updated_on, organization_id, id, name, code, city, region, country, type_id, lat, lng, utc_offset, google_place_id, variants, area, created_on, updated_on)	
 │	type	inner		
 ├── sort			(id, status, organization_id, user_id, load_place_id, load_date, loading_required, deliver_place_id, deliver_date, discharging_required, cargo_weight, cargo_volume, cargo_cost, cargo_cost_currency, cargo_ldm, cargo_parcels, cargo_batch_identifier, cargo_notes, transport_type_id, transport_subtype_id, transport_quantity, created_on, updated_on, canceled_on, need_container, container_number, organization_unit_id, assignee_user_id, name, id, type_id, role_type_id, status_type_id, name, inn, ogrn, kpp, country, region, city, address, natural_address, contacts, created_on, rating, schedule, tms_rating, bank_name, bank_account, bic, correspondent_account, updated_on, organization_id)	-name
 │    │	order	-name		
 │    └── join			(id, status, organization_id, user_id, load_place_id, load_date, loading_required, deliver_place_id, deliver_date, discharging_required, cargo_weight, cargo_volume, cargo_cost, cargo_cost_currency, cargo_ldm, cargo_parcels, cargo_batch_identifier, cargo_notes, transport_type_id, transport_subtype_id, transport_quantity, created_on, updated_on, canceled_on, need_container, container_number, organization_unit_id, assignee_user_id, name, id, type_id, role_type_id, status_type_id, name, inn, ogrn, kpp, country, region, city, address, natural_address, contacts, created_on, rating, schedule, tms_rating, bank_name, bank_account, bic, correspondent_account, updated_on, organization_id)	
 │         │	type	inner		
 │         │	equality	(organization_id) = (id)		
 │         ├── scan			(id, status, organization_id, user_id, load_place_id, load_date, loading_required, deliver_place_id, deliver_date, discharging_required, cargo_weight, cargo_volume, cargo_cost, cargo_cost_currency, cargo_ldm, cargo_parcels, cargo_batch_identifier, cargo_notes, transport_type_id, transport_subtype_id, transport_quantity, created_on, updated_on, canceled_on, need_container, container_number, organization_unit_id, assignee_user_id, name)	
 │         │	table	requests@primary		
 │         │	spans	ALL		
 │         └── scan			(id, type_id, role_type_id, status_type_id, name, inn, ogrn, kpp, country, region, city, address, natural_address, contacts, created_on, rating, schedule, tms_rating, bank_name, bank_account, bic, correspondent_account, updated_on, organization_id)	
 │	table	organizations@primary		
 │	spans	ALL		
 └── scan			(id, name, code, city, region, country, type_id, lat, lng, utc_offset, google_place_id, variants, area, created_on, updated_on)	
	table	places@primary		

EXPLAIN (DISTSQL)
true

https://cockroachdb.github.io/distsqlplan/decode.html#eJzclE1v1DAYhO_8ipVPrTTVxp_5kJDmSBFqUeGGeggbsw1s4-A4ElD1v6PsIrW76qaUI0fb7zPz2mP7TnSh8Rf1rR9E9UlIAaHENUQfw8oPQ4jT9K7ovPkhqgyi7foxTdPXEKsQvajuRGrTxotKXISz0C-tgGh8qtvNtuweIozpARpSvfaicvd4JCznhT_Wnzf-yteNj8tsT170sb2t409G_330QxrEMUP5r4byacMQ13XX_qpTG7rjruqo64PZ2IXY-Oibw4N7vuSJ1t_Uw83b0HY-LtV-5xv_JZ1Qn76O7fomnVCeCojLMVULSlCBGjSgBR2YgwVYgjIDpQSlAqUGpQGlBaUDZQ7KApQlqDJQTToKVBpUBlQWVA5UOagKUJWgzkAtQT0ZalAbUFtQO1DnoC5AXYImA40EjQLN1JkBjQWNA00OmgI0JWgz0ErQKtDqo0Hol8T_IcTk41Lvnx9VeXZU37xEf8rnz-0yT9-uflOv_CAg3oXwbewXX0PbLUJXLWj_v9BAO23BgtaBNgdtAdoSdBnoJOgU6DToDOimvTrQ5aArjgZi9wJ55ue68kMfusH_1deVTc_ON2u_e8ZDGOPKv49htbXZDS-33Hai8UParbrd4LzbLU0NPoblLKz3YHkIqxfA6hDWs7CZb9vMwnYetrNwdgBf37_6HQAA___Ml_-h

@RaduBerinde
Copy link
Member

Thanks!

The lookup join is supposed to preserve the ordering of the input. It does a lookup for every input row and returns results in order.
So the execution plan looks correct. Perhaps there is some problem on the execution side.

Just to confirm - you saw actual results out of order, or do you think there is an issue just from the explain output?

@datobatone
Copy link
Author

Yes, we saw that the data is not sorted by name, they go in a mixed order. The plan shows that the sequence of actions is changing. Before the inclusion of statistics, it first join and then sorts(all correct), and after activating statistics, first sorts, and only then it connects. As a result, we get the wrong sorting.

@RaduBerinde
Copy link
Member

Thanks. I will try to reproduce in a few days.

One more question, is this on a single node cluster? The DISTSQL plan above seems to be on a single node (or at least the tables involved are all on one node). I'm asking because it might also be a problem of how we merge results from multiple nodes.

CC @solongordon - there was a problem with the lookip joiner and ordering, but I think it only concerned outer joins, can you confirm that's the case?

@datobatone
Copy link
Author

Yes, we use one node configuration.

@RaduBerinde
Copy link
Member

Thank you! At this point, the only possible culprit seems to be the lookup joiner. I will try to reproduce.

CC @jordanlewis

@RaduBerinde
Copy link
Member

@datobatone as a workaround you can delete the statistics from the system.table_statistics table, or turn off the optimizer (SET OPTIMIZER = OFF).

@jordanlewis
Copy link
Member

@RaduBerinde have you had any luck reproducing this? I will try to take a look, just let me know what you've done so far.

@RaduBerinde
Copy link
Member

@jordanlewis sorry, I was out for the holidays, I'm back today. I was planning to look at it today.

@RaduBerinde
Copy link
Member

I'm having trouble cajoling the optimizer to put the sort before the lookup join..

@datobatone if you still have the environment, could you also run EXPLAIN (OPT) SELECT * FROM "requests" JOIN "organizations" ON (organizations.id = requests.organization_id) JOIN "places" AS "load_place_id_places" ON (load_place_id_places.id = requests.load_place_id) ORDER BY "requests"."name" desc?

@datobatone
Copy link
Author

before stats

sort
 ├── columns: id:1(uuid!null) status:2(int!null) organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) created_on:22(int!null) updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) name:29(string!null) id:30(uuid!null) type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) country:38(string!null) region:39(string!null) city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) updated_on:52(int!null) organization_id:53(uuid!null) id:54(uuid!null) name:55(string!null) code:56(string!null) city:57(string!null) region:58(string!null) country:59(string!null) type_id:60(int!null) lat:61(float) lng:62(float) utc_offset:63(float!null) google_place_id:64(string!null) variants:65(string!null) area:66(string!null) created_on:67(int!null) updated_on:68(int!null)
 ├── stats: [rows=1000, distinct(5)=99.9954827, distinct(54)=99.9954827]
 ├── cost: 4659.31569
 ├── key: (1)
 ├── fd: (1)-->(2-29), (30)-->(31-53), (35)-->(30-34,36-53), (3)==(30), (30)==(3), (54)-->(55-68), (55,56,58-60)-->(54,57,61-68), (5)==(54), (54)==(5)
 ├── ordering: -29
 ├── prune: (1,2,4,6-29,31-53,55-68)
 └── inner-join
      ├── columns: requests.id:1(uuid!null) status:2(int!null) requests.organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) requests.created_on:22(int!null) requests.updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) requests.name:29(string!null) organizations.id:30(uuid!null) organizations.type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) organizations.name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) organizations.country:38(string!null) organizations.region:39(string!null) organizations.city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) organizations.created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) organizations.updated_on:52(int!null) organizations.organization_id:53(uuid!null) places.id:54(uuid!null) places.name:55(string!null) code:56(string!null) places.city:57(string!null) places.region:58(string!null) places.country:59(string!null) places.type_id:60(int!null) lat:61(float) lng:62(float) utc_offset:63(float!null) google_place_id:64(string!null) variants:65(string!null) area:66(string!null) places.created_on:67(int!null) places.updated_on:68(int!null)
      ├── stats: [rows=1000, distinct(5)=99.9954827, distinct(54)=99.9954827]
      ├── cost: 4440
      ├── key: (1)
      ├── fd: (1)-->(2-29), (30)-->(31-53), (35)-->(30-34,36-53), (3)==(30), (30)==(3), (54)-->(55-68), (55,56,58-60)-->(54,57,61-68), (5)==(54), (54)==(5)
      ├── prune: (1,2,4,6-29,31-53,55-68)
      ├── inner-join
      │    ├── columns: requests.id:1(uuid!null) status:2(int!null) requests.organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) requests.created_on:22(int!null) requests.updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) requests.name:29(string!null) organizations.id:30(uuid!null) organizations.type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) organizations.name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) organizations.country:38(string!null) organizations.region:39(string!null) organizations.city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) organizations.created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) organizations.updated_on:52(int!null) organizations.organization_id:53(uuid!null)
      │    ├── stats: [rows=1000, distinct(3)=100, distinct(5)=99.9954827, distinct(30)=100]
      │    ├── cost: 3100
      │    ├── key: (1)
      │    ├── fd: (1)-->(2-29), (30)-->(31-53), (35)-->(30-34,36-53), (3)==(30), (30)==(3)
      │    ├── prune: (1,2,4-29,31-53)
      │    ├── interesting orderings: (+1) (+3,+27,+4,+1) (+8,+1) (+5,+1) (+2,+1) (+3,+1) (+30) (+35)
      │    ├── scan requests
      │    │    ├── columns: requests.id:1(uuid!null) status:2(int!null) requests.organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) requests.created_on:22(int!null) requests.updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) requests.name:29(string!null)
      │    │    ├── stats: [rows=1000, distinct(3)=100, distinct(5)=100]
      │    │    ├── cost: 1580
      │    │    ├── key: (1)
      │    │    ├── fd: (1)-->(2-29)
      │    │    ├── prune: (1-29)
      │    │    └── interesting orderings: (+1) (+3,+27,+4,+1) (+8,+1) (+5,+1) (+2,+1) (+3,+1)
      │    ├── scan organizations
      │    │    ├── columns: organizations.id:30(uuid!null) organizations.type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) organizations.name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) organizations.country:38(string!null) organizations.region:39(string!null) organizations.city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) organizations.created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) organizations.updated_on:52(int!null) organizations.organization_id:53(uuid!null)
      │    │    ├── stats: [rows=1000, distinct(30)=1000]
      │    │    ├── cost: 1480
      │    │    ├── key: (30)
      │    │    ├── fd: (30)-->(31-53), (35)-->(30-34,36-53)
      │    │    ├── prune: (30-53)
      │    │    └── interesting orderings: (+30) (+35)
      │    └── filters [type=bool, outer=(3,30), constraints=(/3: (/NULL - ]; /30: (/NULL - ]), fd=(3)==(30), (30)==(3)]
      │         └── eq [type=bool, outer=(3,30), constraints=(/3: (/NULL - ]; /30: (/NULL - ])]
      │              ├── variable: organizations.id [type=uuid, outer=(30)]
      │              └── variable: requests.organization_id [type=uuid, outer=(3)]
      ├── scan places
      │    ├── columns: places.id:54(uuid!null) places.name:55(string!null) code:56(string!null) places.city:57(string!null) places.region:58(string!null) places.country:59(string!null) places.type_id:60(int!null) lat:61(float) lng:62(float) utc_offset:63(float!null) google_place_id:64(string!null) variants:65(string!null) area:66(string!null) places.created_on:67(int!null) places.updated_on:68(int!null)
      │    ├── stats: [rows=1000, distinct(54)=1000]
      │    ├── cost: 1300
      │    ├── key: (54)
      │    ├── fd: (54)-->(55-68), (55,56,58-60)-->(54,57,61-68)
      │    ├── prune: (54-68)
      │    └── interesting orderings: (+54) (+55,+56,+58,+59,+60) (+60,+54)
      └── filters [type=bool, outer=(5,54), constraints=(/5: (/NULL - ]; /54: (/NULL - ]), fd=(5)==(54), (54)==(5)]
           └── eq [type=bool, outer=(5,54), constraints=(/5: (/NULL - ]; /54: (/NULL - ])]
                ├── variable: places.id [type=uuid, outer=(54)]
                └── variable: load_place_id [type=uuid, outer=(5)]

after stats

inner-join (lookup places)
 ├── columns: id:1(uuid!null) status:2(int!null) organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) created_on:22(int!null) updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) name:29(string!null) id:30(uuid!null) type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) country:38(string!null) region:39(string!null) city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) updated_on:52(int!null) organization_id:53(uuid!null) id:54(uuid!null) name:55(string!null) code:56(string!null) city:57(string!null) region:58(string!null) country:59(string!null) type_id:60(int!null) lat:61(float) lng:62(float) utc_offset:63(float!null) google_place_id:64(string!null) variants:65(string!null) area:66(string!null) created_on:67(int!null) updated_on:68(int!null)
 ├── key columns: [5] = [54]
 ├── stats: [rows=1013.00347, distinct(5)=99.9954827, distinct(54)=99.9954827]
 ├── cost: 8636.2202
 ├── key: (1)
 ├── fd: (1)-->(2-29), (30)-->(31-53), (35)-->(30-34,36-53), (3)==(30), (30)==(3), (54)-->(55-68), (55,56,58-60)-->(54,57,61-68), (5)==(54), (54)==(5)
 ├── ordering: -29
 ├── prune: (1,2,4,6-29,31-53,55-68)
 ├── sort
 │    ├── columns: requests.id:1(uuid!null) status:2(int!null) requests.organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) requests.created_on:22(int!null) requests.updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) requests.name:29(string!null) organizations.id:30(uuid!null) organizations.type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) organizations.name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) organizations.country:38(string!null) organizations.region:39(string!null) organizations.city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) organizations.created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) organizations.updated_on:52(int!null) organizations.organization_id:53(uuid!null)
 │    ├── stats: [rows=1000, distinct(3)=100, distinct(5)=99.9954827, distinct(30)=100]
 │    ├── cost: 3319.31569
 │    ├── key: (1)
 │    ├── fd: (1)-->(2-29), (30)-->(31-53), (35)-->(30-34,36-53), (3)==(30), (30)==(3)
 │    ├── ordering: -29
 │    ├── prune: (1,2,4-29,31-53)
 │    ├── interesting orderings: (+1) (+3,+27,+4,+1) (+8,+1) (+5,+1) (+2,+1) (+3,+1) (+30) (+35)
 │    └── inner-join
 │         ├── columns: requests.id:1(uuid!null) status:2(int!null) requests.organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) requests.created_on:22(int!null) requests.updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) requests.name:29(string!null) organizations.id:30(uuid!null) organizations.type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) organizations.name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) organizations.country:38(string!null) organizations.region:39(string!null) organizations.city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) organizations.created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) organizations.updated_on:52(int!null) organizations.organization_id:53(uuid!null)
 │         ├── stats: [rows=1000, distinct(3)=100, distinct(5)=99.9954827, distinct(30)=100]
 │         ├── cost: 3100
 │         ├── key: (1)
 │         ├── fd: (1)-->(2-29), (30)-->(31-53), (35)-->(30-34,36-53), (3)==(30), (30)==(3)
 │         ├── prune: (1,2,4-29,31-53)
 │         ├── interesting orderings: (+1) (+3,+27,+4,+1) (+8,+1) (+5,+1) (+2,+1) (+3,+1) (+30) (+35)
 │         ├── scan requests
 │         │    ├── columns: requests.id:1(uuid!null) status:2(int!null) requests.organization_id:3(uuid!null) user_id:4(uuid!null) load_place_id:5(uuid!null) load_date:6(int!null) loading_required:7(bool!null) deliver_place_id:8(uuid!null) deliver_date:9(int!null) discharging_required:10(bool!null) cargo_weight:11(float!null) cargo_volume:12(float!null) cargo_cost:13(float!null) cargo_cost_currency:14(int!null) cargo_ldm:15(int!null) cargo_parcels:16(int!null) cargo_batch_identifier:17(string!null) cargo_notes:18(string!null) transport_type_id:19(int!null) transport_subtype_id:20(int!null) transport_quantity:21(int!null) requests.created_on:22(int!null) requests.updated_on:23(int!null) canceled_on:24(int!null) need_container:25(bool!null) container_number:26(string!null) organization_unit_id:27(uuid!null) assignee_user_id:28(uuid!null) requests.name:29(string!null)
 │         │    ├── stats: [rows=1000, distinct(3)=100, distinct(5)=100]
 │         │    ├── cost: 1580
 │         │    ├── key: (1)
 │         │    ├── fd: (1)-->(2-29)
 │         │    ├── prune: (1-29)
 │         │    └── interesting orderings: (+1) (+3,+27,+4,+1) (+8,+1) (+5,+1) (+2,+1) (+3,+1)
 │         ├── scan organizations
 │         │    ├── columns: organizations.id:30(uuid!null) organizations.type_id:31(int!null) role_type_id:32(int!null) status_type_id:33(int!null) organizations.name:34(string!null) inn:35(string!null) ogrn:36(string!null) kpp:37(string!null) organizations.country:38(string!null) organizations.region:39(string!null) organizations.city:40(string!null) address:41(string!null) natural_address:42(string!null) contacts:43(string!null) organizations.created_on:44(int!null) rating:45(float!null) schedule:46(string!null) tms_rating:47(float!null) bank_name:48(string!null) bank_account:49(string!null) bic:50(string!null) correspondent_account:51(string!null) organizations.updated_on:52(int!null) organizations.organization_id:53(uuid!null)
 │         │    ├── stats: [rows=1000, distinct(30)=1000]
 │         │    ├── cost: 1480
 │         │    ├── key: (30)
 │         │    ├── fd: (30)-->(31-53), (35)-->(30-34,36-53)
 │         │    ├── prune: (30-53)
 │         │    └── interesting orderings: (+30) (+35)
 │         └── filters [type=bool, outer=(3,30), constraints=(/3: (/NULL - ]; /30: (/NULL - ]), fd=(3)==(30), (30)==(3)]
 │              └── eq [type=bool, outer=(3,30), constraints=(/3: (/NULL - ]; /30: (/NULL - ])]
 │                   ├── variable: organizations.id [type=uuid, outer=(30)]
 │                   └── variable: requests.organization_id [type=uuid, outer=(3)]
 └── true [type=bool]

@RaduBerinde
Copy link
Member

Never mind - I was able to get that plan. I have not been able to come up with a case where the result ordering is incorrect though. @datobatone do you think you might be able to share your dataset with us?

For the record, this is how I reproduced the plan:

create table places (id int primary key, name string);
create table organizations (id int primary key);
create table requests (id int primary key, organization_id int, load_place_id int, name string);
alter table places inject statistics '[ {"columns": ["id"], "created_at": "2018-01-01 1:00:00.00000+00:00", "row_count": 300000, "distinct_count": 3000}]';

explain SELECT *
  FROM "requests" JOIN "organizations" ON (organizations.id = requests.organization_id)
  JOIN "places" AS "load_place_id_places" ON (load_place_id_places.id = requests.load_place_id)
  ORDER BY "requests"."name" desc;

@datobatone
Copy link
Author

hi.
Before stats https://yadi.sk/d/6I_7uDKs_kGMzw
After stats https://yadi.sk/d/V0HnT-eC8lpuIQ

@RaduBerinde
Copy link
Member

@datobatone thanks, this is great! I loaded that file (it's the same in both archives btw), created stats, then ran the query (but selected just requests.name).

The result is indeed out of order. I added some debugging messages to joinReader and it is messing up the order: https://gist.github.com/RaduBerinde/28d1c0bf4a3f16e216ffeb8770e79b43 Note that the input rows are ordered but the output rows (which project only the name column from the input) are in another order.

@jordanlewis / @solongordon I am putting this on your plate (I tried to look at the joinreader code a bit but I'm no longer very familiar with it).

@jordanlewis
Copy link
Member

The problem is that the join reader makes an incorrect assumption, that each input row's join key is unique. This is a problem because it assumes that the order that comes back from the scan is the correct output order in the case of inner join, since the kv fetcher will respect the order of the spans that you pass it.

But if you have keys that need the same span, all of a sudden the ordering that the scanner gives you back is incorrect.

This was likely a problem since lookup join was added, since it was extended from index join: index join does not have to solve this problem since it is guaranteed that each input row has a unique key into the lookup table.

The fix is to reverse the order of the mapping between input rows and lookup rows - keep it from input to lookup, rather than the reverse, which is what it is today.

@jordanlewis
Copy link
Member

Minimal repro:

[email protected]:61733/defaultdb> create table a (a int, b int);
CREATE TABLE

Time: 6.192ms

[email protected]:61733/defaultdb> create table b (b int primary key);                                                                                                                                         CREATE TABLE

Time: 4.595ms

[email protected]:61733/defaultdb> insert into a values(1,1),(2,2),(3,1);
INSERT 3

Time: 6.235ms

[email protected]:61733/defaultdb> insert into b values(1),(2);
INSERT 2

Time: 5.36ms

[email protected]:61733/defaultdb> alter table b inject statistics '[ {"columns": ["b"], "created_at": "2018-01-01 1:00:00", "row_count": 300000, "distinct_count": 3000}]';
ALTER TABLE

Time: 6.983ms

[email protected]:61733/defaultdb> explain select a.a from a join b on a.b=b.b order by a desc;
         tree         | field | description
+---------------------+-------+-------------+
  render              |       |
   └── lookup-join    |       |
        │             | type  | inner
        ├── sort      |       |
        │    │        | order | -a
        │    └── scan |       |
        │             | table | a@primary
        │             | spans | ALL
        └── scan      |       |
                      | table | b@primary
(10 rows)

Time: 3.96ms

[email protected]:61733/defaultdb> select a.a from a join b on a.b=b.b order by a desc;
  a
+---+
  3
  1
  2
(3 rows)

Time: 1.839ms

@jordanlewis
Copy link
Member

The logic test in lookup_join would have caught this, but the test wasn't testing what it thought it was - the injected statistics didn't convince the optimizer to put the sort before the lookup join.

craig bot pushed a commit that referenced this issue Jan 14, 2019
33536: distsqlrun: fix lookup join order preservation r=jordanlewis a=jordanlewis

Fixes #33354.

Previously, a lookup join with a left side that doesn't have an
injective mapping into the right side would fail to preserve its input
order, in violation of its contract.

Now, the code is modified to use the order-preserving pathway that was
designed for outer joins all the time, to prevent the issue.

Release note (bug fix): lookup joins preserve their input order even if
more than one row of the input corresponds to the same row of the lookup
table.

Co-authored-by: Jordan Lewis <[email protected]>
@craig craig bot closed this as completed in #33536 Jan 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-investigation Further steps needed to qualify. C-label will change. O-community Originated from the community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants