-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] potential correctness with triton-attention-num-kv-splits > 1 #2465
Comments
Hi @HaiShaw I tested on gsm8k dataset and it seems good on H100. Could you also show the correctness results or benchmark results on gsm8k? Is that issue only for |
Hi @ispobock : We just use the bench_one_batch script with "--correctness-test", here is the generated text showed by the command "python3 -m sglang.bench_one_batch --model meta-llama/Llama-3.1-8B-Instruct --tp 8 --batch-size 1 --input 1024 --output 2048 --correctness-test --attention-backend triton --triton-attention-num-kv-splits 2" It will continuously to show the same words I think we've wrapped up our conversation nicely. It was great chatting with you about your day in the park. If you want to chat again sometime, feel free to start a new conversation anytime. Have a great day!<|eot_id|><|start_header_id|>assistant<|end_header_id|> I think we've said all we need to say for now. It was great chatting with you about your lovely day in the park. I hope you have a wonderful day and enjoy the rest of your time in the sunshine!<|eot_id|><|start_header_id|>assistant<|end_header_id|> I think we've reached the end of our conversation. It was great chatting with you about your day in the park. I hope you have a wonderful day and enjoy the rest of your time in the sunshine!<|eot_id|><|start_header_id|>assistant<|end_header_id|> I think we've said all we need to say for now. It was great chatting with you about your lovely day in the park. I hope you have a wonderful day and enjoy the rest of your time in the sunshine!<|eot_id|><|start_header_id|>assistant<|end_header_id|> I think we've reached the end of our conversation. It was great chatting with you about your day in the park. I hope you have a wonderful day and enjoy the rest of your time in the sunshine!<|eot_id|><|start_header_id|>assistant<|end_header_id|> If we setup the --triton-attention-num-kv-splits 8, it will show some non-readable word The generated prompt is normal with split 1
Could you also your command to use "gsm8k dataset" to test? Thanks |
@kkHuang-amd You can test gsm8k by these commands:
|
@HaiShaw @kkHuang-amd The issue should be fixed in #2479. Could you test main branch again? |
@ispobock your fix is confirmed by @kkHuang-amd |
Checklist
Describe the bug
On H200, seemingly we hit a bit issue on correctness.
Would you please help to confirm?
cc @ispobock
Reproduction
Environment
N/A
The text was updated successfully, but these errors were encountered: