Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: compactor produced keys are not ordered #2219

Closed
skyzh opened this issue Apr 29, 2022 · 7 comments · Fixed by #2226
Closed

storage: compactor produced keys are not ordered #2219

skyzh opened this issue Apr 29, 2022 · 7 comments · Fixed by #2226
Labels
component/storage Storage type/bug Something isn't working

Comments

@skyzh
Copy link
Contributor

skyzh commented Apr 29, 2022

https://github.com/singularity-data/risingwave/runs/6221697790?check_suite_focus=true

image

@skyzh skyzh added type/feature type/bug Something isn't working component/storage Storage and removed type/feature labels Apr 29, 2022
@skyzh
Copy link
Contributor Author

skyzh commented Apr 29, 2022

Maybe caused by #2122

@BowenXiao1999
Copy link
Contributor

BowenXiao1999 commented Apr 29, 2022

Update: It is fixed by #2245. Not realtes to this issue.


I also encounter some problems, but I' m not sure it is due to Compaction (or my own code problems). When I use table id to replace the keyspace encoding of Simple Agg, I get panic when comparing key: bad_full_key_format (Intersting thing is, It do not happen for HashAgg (with group key) or HashJoin (with join key) ). The key length is only 5 (exactly the length of table id) and less than Epoch length so can not successfully split user key and epoch.

In my opinion, I would suspect that compaction may not be able to extend user key with epoch in some scenarios (e.g. Simple Agg, the keyspace is just a table_id). Previous executor_id/agg_call_idx keyspace is ok cuz it's larger than epoch length 8, now table_id only 5. I'll file a detail issue later after Wallace's PR get merged and check out with compaction guys if needed.

@skyzh skyzh reopened this Apr 29, 2022
@skyzh
Copy link
Contributor Author

skyzh commented Apr 29, 2022

Still not resolved. reproduced at https://github.com/singularity-data/risingwave/actions/runs/2244511049

@hzxa21
Copy link
Collaborator

hzxa21 commented Apr 29, 2022

Digging into the codes and the logs a little bit. The bug is caused by missing to select all overlapping L1 files as the input sst for compaction, which then causes overlaps in L1 SSTs.

Assume

SSTs:
L0: [2, 3]
L1: [5, 6]

Key Range:
|____2____|             |____3____| 
      |__5__| |__6__|

Here is what happned:

  1. Initially we put L0: [2] in the compaction input SST set
  2. We expand input to L0: [2], L1: [5] since 5 overlaps with 2
  3. We expand input to L0: [2, 3], L1: [5] since the input set size is too small
  4. We won't expand input any more since we only check overlap with the newly added SST:
    https://github.com/singularity-data/risingwave/blob/9dbdfb2e352210d3eba953268c9cdd98edd79664/src/meta/src/hummock/compaction/tier_compaction_picker.rs#L246

The expected input set is L0: [2, 3], L1: [5, 6]. The root cause is we didn't check overlap on the full key range on the input set.

@hzxa21
Copy link
Collaborator

hzxa21 commented Apr 29, 2022

Debugging tips:

  • Check compactor logs to find which task cause the panic (grep "Ready to handle compaction task")
  • Check meta logs to find the compaction info (grep "Compaction task id")
  • Check meta logs to find the KeyRange of an SST (grep "KeyRange")

@hzxa21
Copy link
Collaborator

hzxa21 commented May 11, 2022

Fixed in #2242

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/storage Storage type/bug Something isn't working
Projects
None yet
3 participants