Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore: workload specific GC optimizations #1823

Closed
Tracked by #1898
teh-cmc opened this issue Apr 12, 2023 · 0 comments · Fixed by #4397
Closed
Tracked by #1898

datastore: workload specific GC optimizations #1823

teh-cmc opened this issue Apr 12, 2023 · 0 comments · Fixed by #4397
Assignees
Labels
🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Apr 12, 2023

E.g. under the right set of workloads and conditions, the indexed buckets effectively behave like ringbuffers, making it possible to avoid on the costs of sorting on the GC path.

@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Apr 12, 2023
@teh-cmc teh-cmc self-assigned this Nov 27, 2023
teh-cmc added a commit that referenced this issue Dec 2, 2023
This turns every single column in `DataStore`/`DataTable` into a
ringbuffer (`VecDeque`).

This means that on the common/happy path of data being ingested in
order:
1. Inserting new rows doesn't require re-sorting the bucket (that's
nothing new), and
2. garbage collecting rows doesn't require re-sorting the bucket nor
copying anything (that's very new).

This leads to very significant performance improvements on the common
path.

- Fixes #1823 

### Benchmarks

Compared to `main`:
```
group                                                     gc_improvements_0                       gc_improvements_3
-----                                                     -----------------                       -----------------
.../plotting_dashboard/drop_at_least=0.3/bucketsz=1024    4.50   1084.0±4.47ms 54.1 KElem/sec     1.00    241.0±1.66ms 243.1 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=2048    8.86       2.1±0.02s 27.6 KElem/sec     1.00    239.9±2.70ms 244.3 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=256     1.88    465.8±2.50ms 125.8 KElem/sec    1.00    247.4±3.94ms 236.8 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=512     2.72    655.3±2.61ms 89.4 KElem/sec     1.00    241.2±2.06ms 243.0 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/default          2.72    652.8±4.12ms 89.8 KElem/sec     1.00    239.6±1.98ms 244.6 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=1024         40.21      2.4±0.05s 24.2 KElem/sec     1.00     60.3±1.16ms 972.3 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=2048         40.08      2.4±0.03s 24.1 KElem/sec     1.00     60.8±1.14ms 964.3 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=256          40.97      2.5±0.08s 23.5 KElem/sec     1.00     61.0±1.99ms 960.9 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=512          39.45      2.4±0.02s 24.5 KElem/sec     1.00     60.6±1.45ms 966.9 KElem/sec
.../timeless_logs/drop_at_least=0.3/default               41.78      2.4±0.03s 24.4 KElem/sec     1.00     57.6±0.35ms 1018.1 KElem/sec
```

Compared to previous PR:
```
group                                                     gc_improvements_1                       gc_improvements_3
-----                                                     -----------------                       -----------------
.../plotting_dashboard/drop_at_least=0.3/bucketsz=1024    4.63   1117.2±9.07ms 52.4 KElem/sec     1.00    241.0±1.66ms 243.1 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=2048    8.96       2.1±0.01s 27.3 KElem/sec     1.00    239.9±2.70ms 244.3 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=256     1.91    471.5±4.76ms 124.3 KElem/sec    1.00    247.4±3.94ms 236.8 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/bucketsz=512     2.76    666.7±6.64ms 87.9 KElem/sec     1.00    241.2±2.06ms 243.0 KElem/sec
.../plotting_dashboard/drop_at_least=0.3/default          2.78    665.6±4.67ms 88.0 KElem/sec     1.00    239.6±1.98ms 244.6 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=1024         134.66      8.1±0.10s  7.2 KElem/sec    1.00     60.3±1.16ms 972.3 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=2048         132.44      8.0±0.09s  7.3 KElem/sec    1.00     60.8±1.14ms 964.3 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=256          132.22      8.1±0.11s  7.3 KElem/sec    1.00     61.0±1.99ms 960.9 KElem/sec
.../timeless_logs/drop_at_least=0.3/bucketsz=512          133.27      8.1±0.11s  7.3 KElem/sec    1.00     60.6±1.45ms 966.9 KElem/sec
.../timeless_logs/drop_at_least=0.3/default               140.04      8.1±0.07s  7.3 KElem/sec    1.00     57.6±0.35ms 1018.1 KElem/sec
```

---

Part of the GC improvements series:
- #4394
- #4395
- #4396
- #4397
- #4398
- #4399
- #4400
- #4401
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant