-
Notifications
You must be signed in to change notification settings - Fork 398
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datastore: workload specific GC optimizations #1823
Labels
Comments
This was referenced Apr 12, 2023
4 tasks
teh-cmc
added a commit
that referenced
this issue
Dec 2, 2023
This turns every single column in `DataStore`/`DataTable` into a ringbuffer (`VecDeque`). This means that on the common/happy path of data being ingested in order: 1. Inserting new rows doesn't require re-sorting the bucket (that's nothing new), and 2. garbage collecting rows doesn't require re-sorting the bucket nor copying anything (that's very new). This leads to very significant performance improvements on the common path. - Fixes #1823 ### Benchmarks Compared to `main`: ``` group gc_improvements_0 gc_improvements_3 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 4.50 1084.0±4.47ms 54.1 KElem/sec 1.00 241.0±1.66ms 243.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 8.86 2.1±0.02s 27.6 KElem/sec 1.00 239.9±2.70ms 244.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.88 465.8±2.50ms 125.8 KElem/sec 1.00 247.4±3.94ms 236.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 2.72 655.3±2.61ms 89.4 KElem/sec 1.00 241.2±2.06ms 243.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 2.72 652.8±4.12ms 89.8 KElem/sec 1.00 239.6±1.98ms 244.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 40.21 2.4±0.05s 24.2 KElem/sec 1.00 60.3±1.16ms 972.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 40.08 2.4±0.03s 24.1 KElem/sec 1.00 60.8±1.14ms 964.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 40.97 2.5±0.08s 23.5 KElem/sec 1.00 61.0±1.99ms 960.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 39.45 2.4±0.02s 24.5 KElem/sec 1.00 60.6±1.45ms 966.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 41.78 2.4±0.03s 24.4 KElem/sec 1.00 57.6±0.35ms 1018.1 KElem/sec ``` Compared to previous PR: ``` group gc_improvements_1 gc_improvements_3 ----- ----------------- ----------------- .../plotting_dashboard/drop_at_least=0.3/bucketsz=1024 4.63 1117.2±9.07ms 52.4 KElem/sec 1.00 241.0±1.66ms 243.1 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=2048 8.96 2.1±0.01s 27.3 KElem/sec 1.00 239.9±2.70ms 244.3 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=256 1.91 471.5±4.76ms 124.3 KElem/sec 1.00 247.4±3.94ms 236.8 KElem/sec .../plotting_dashboard/drop_at_least=0.3/bucketsz=512 2.76 666.7±6.64ms 87.9 KElem/sec 1.00 241.2±2.06ms 243.0 KElem/sec .../plotting_dashboard/drop_at_least=0.3/default 2.78 665.6±4.67ms 88.0 KElem/sec 1.00 239.6±1.98ms 244.6 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=1024 134.66 8.1±0.10s 7.2 KElem/sec 1.00 60.3±1.16ms 972.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=2048 132.44 8.0±0.09s 7.3 KElem/sec 1.00 60.8±1.14ms 964.3 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=256 132.22 8.1±0.11s 7.3 KElem/sec 1.00 61.0±1.99ms 960.9 KElem/sec .../timeless_logs/drop_at_least=0.3/bucketsz=512 133.27 8.1±0.11s 7.3 KElem/sec 1.00 60.6±1.45ms 966.9 KElem/sec .../timeless_logs/drop_at_least=0.3/default 140.04 8.1±0.07s 7.3 KElem/sec 1.00 57.6±0.35ms 1018.1 KElem/sec ``` --- Part of the GC improvements series: - #4394 - #4395 - #4396 - #4397 - #4398 - #4399 - #4400 - #4401
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
E.g. under the right set of workloads and conditions, the indexed buckets effectively behave like ringbuffers, making it possible to avoid on the costs of sorting on the GC path.
The text was updated successfully, but these errors were encountered: