-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InfiniteBench for long context benchmarking #2421
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! May we combine these scripts just to one? Something like this
sglang/python/sglang/bench_serving.py
Line 519 in 641b7d0
SHAREGPT_URL = "https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json" |
Implement the process of downloading files into a script to make it more convenient for users.
Additionally, the section about TensorRT LLM is very good! Would you be willing to help improve this custom task script to make it easier to test TensorRT LLM? |
close #1273 |
|
Sounds good, I will merge the downloading script for sglang, we can keep the downloading script for tensorrt. I will also work on the custom task script PR. I am traveling, so it may take some time but will try to do it asap. |
) | ||
parser.add_argument("--data-dir", type=str, default="./data") | ||
parser.add_argument("--start-idx", type=int, default=0) | ||
parser.add_argument("--end-idx", type=int, default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add more descriptions about the "--start-idx" and "--end-idx" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed these arguments, which were borrowed from tensorrt eval script, and added num-samples
with description.
Is this ready to be merged? |
python convert_checkpoint.py \ | ||
--model_dir ./Llama-3-8B-Instruct-Gradient-1048k/ \ | ||
--output_dir /tmp/llama-3-8B-1048k/trt_ckpts \ | ||
--dtype float16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the dtype specified in the model's config.json is bfloat16. Could you please explain why float16 is being specified here?
Motivation
This PR adds support for eval on a long context benchmark, InfiniteBench. See #1273 for more context.
Modifications
Following the discussion in #1273, it currently adds code from TensorRT-LLM repo (link) to load the data, create prompts and compute scores. Following are the sample outputs for both cases using
gradientai/Llama-3-8B-Instruct-Gradient-1048k
with maximum input length of ~130K. Please check readme for more details and instructions on how to run both the benchmarks. Currently, predictions are different (see below) which I will try to fix.SGLang
TensorRT-LLM
Checklist