-
Notifications
You must be signed in to change notification settings - Fork 710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark results for DeepSeek-v3 in 2x8xH200 Cluster #2738
Conversation
Hi @roG0d Sorry for the late response, may you try the latest version v0.4.1.post4 |
We already have the results for |
@roG0d May I also suggest that the interconnect setup between the 2 nodes are also documented in the benchmarks? For example whether the interconnect between 2 nodes is Nvidia Infiniband or Amazon EFA, the NCCL version, etc. It would be easier for broader audience to follow or replicate the benchmarking result. Thanks so much! |
Hello @roG0d ! However, if the FP8 and BF16 methods mentioned in the benchmark refer specifically to gemm FP8 and gemm BF16 approaches, it would be delightful if you could address my questions regarding the measurement methods used in this benchmark. Question 1 the arguments passed when running the server. The arguments you (and some other sglang users) used to launch the server are as follows:
Here, I identified the following 3 concern:
If these questions can be clarified, I believe we can have more confidence in the benchmark results and the FP8 performance of DeepSeek V3. cc. @zhyncs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details docs and instructions. As we will optimize the performance very rapidly, many results in this PR will be outdated. To reduce our maintenance overhead, it is better to use blog posts / github discussions / github issues to share these results instead of maintaining them inside the repo.
I will close this for now because we won't merge this, but feel free to keep the discussion in this thread.
Motivation
For output files and logs, please refer to: https://github.com/datacrunch-research/h200-benchmarks
Modifications
benchmark_dsv3
resembling a similar structure of other benchmark folders.deepseek_v3.sh
script containing each benchmark performed.README.md
containing the metrics obtained from the benchmarks performed.Checklist