Add a benchmark script for in-batch prefix caching #2494

merrymercy · 2024-12-17T02:19:34Z

Changes

Add a benchmark script bench_in_batch_prefix.py from @Ying1123
Clean up the code of delay scheduling for in-batch prefix caching introduced in [Core] in batch prefix caching by delay scheduling #2442.

Some results

Please note that there are huge variances across runs.

Results with the old default policy

Latency of test_batch_by_batch          : 1.9585 s
Latency of test_batch_by_batch_with_hint: 1.9020 s
Latency of test_send_all                : 3.7643 s

Results with the reverted delay scheduling from [Core] in batch prefix caching by delay scheduling #2442.

Latency of test_batch_by_batch          : 1.9868 s
Latency of test_batch_by_batch_with_hint: 1.9520 s
Latency of test_send_all                : 3.0549 s

The goal: the Latency of test_send_all should be very close to the Latency of test_batch_by_batch_with_hint (the oracle).

libratiger · 2024-12-27T09:48:03Z

I was intrigued by your benchmark results, particularly noting that test_send_all showed slower performance compared to other methods. To verify this, I conducted tests on my own server, but interestingly, I observed opposite results:

Latency of test_batch_by_batch          : 2.3765 s
Latency of test_batch_by_batch_with_hint: 2.4907 s
Latency of test_send_all                : 2.2191 s

The tests were run using the following commands:

python3 -m sglang.launch_server --model Qwen/Qwen2.5-3B-Instruct
# Changed the tokenizer to Qwen
python bench_in_batch_prefix.py

Environment details:

SGLang version: 0.4.1
Hardware: A100 GPU

I repeated these experiments multiple times and also tested with a smaller model, but the results consistently showed the same pattern: test_send_all performed faster than the other methods.

libratiger · 2024-12-27T09:48:52Z

attention_backend='flashinfer', sampling_backend='flashinfer',

merrymercy requested review from Ying1123 and hnyls2002 as code owners December 17, 2024 02:19

merrymercy changed the title ~~Revert the delayed scheduling for in-batch prefix caching~~ Add a benchmark script for in-batch prefix caching Dec 17, 2024

rkooo567 approved these changes Dec 17, 2024

View reviewed changes

Update bench

401414b

merrymercy force-pushed the pr-add-in-prefix-bench branch from 61f9266 to 401414b Compare December 17, 2024 02:43

merrymercy added 2 commits December 16, 2024 18:47

update

e432aa1

Fix

207a2a9

merrymercy merged commit 56198b4 into main Dec 17, 2024
1 of 2 checks passed

merrymercy deleted the pr-add-in-prefix-bench branch December 17, 2024 02:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a benchmark script for in-batch prefix caching #2494

Add a benchmark script for in-batch prefix caching #2494

merrymercy commented Dec 17, 2024 •

edited

Loading

libratiger commented Dec 27, 2024

libratiger commented Dec 27, 2024

Add a benchmark script for in-batch prefix caching #2494

Add a benchmark script for in-batch prefix caching #2494

Conversation

merrymercy commented Dec 17, 2024 • edited Loading

Changes

Some results

libratiger commented Dec 27, 2024

libratiger commented Dec 27, 2024

merrymercy commented Dec 17, 2024 •

edited

Loading