Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sgl-kernel adapt tensorrt llm custom allreduce #2481

Merged
merged 9 commits into from
Dec 15, 2024

Conversation

yizhang2077
Copy link
Collaborator

@yizhang2077 yizhang2077 commented Dec 14, 2024

Motivation

  • move tensorrt custom allreduce algoithm to sgl-kernel, make adaption for python, add test for custom allreduce
  • we do not use twoshot allreduce kernel from tensorrt llm since it is disabled here

Modifications

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

Next

  • replace all sglang code from vllm.distributed -> sglang.srt.distributed

test/srt/test_custom_all_reduce.py Fixed Show fixed Hide fixed
test/srt/test_custom_all_reduce.py Fixed Show fixed Hide fixed
@zhyncs
Copy link
Member

zhyncs commented Dec 14, 2024

@yizhang2077 Could we split this into two PRs? One for the sgl-kernel, which requires a new version release, and another for updating the Python package replacement and dependencies.

@yizhang2077
Copy link
Collaborator Author

@yizhang2077 Could we split this into two PRs? One for the sgl-kernel, which requires a new version release, and another for updating the Python package replacement and dependencies.

OK

@yizhang2077 yizhang2077 force-pushed the adapt-tensorrt-llm-custom-allreduce branch from 4982d53 to 69df322 Compare December 14, 2024 14:15
@yizhang2077 yizhang2077 changed the title Adapt tensorrt llm custom allreduce sgl-kernel adapt tensorrt llm custom allreduce Dec 14, 2024
sgl-kernel/pyproject.toml Outdated Show resolved Hide resolved
sgl-kernel/setup.py Show resolved Hide resolved
sgl-kernel/src/sgl-kernel/csrc/trt_reduce_internal.cu Outdated Show resolved Hide resolved
sgl-kernel/src/sgl-kernel/csrc/trt_reduce_internal.cu Outdated Show resolved Hide resolved
sgl-kernel/src/sgl-kernel/csrc/trt_reduce_internal.cuh Outdated Show resolved Hide resolved
sgl-kernel/src/sgl-kernel/ops/__init__.py Show resolved Hide resolved
@zhyncs
Copy link
Member

zhyncs commented Dec 14, 2024

BTW We might also update the CMakeLists.txt, as we use it for clangd semantic indexing.

@zhyncs
Copy link
Member

zhyncs commented Dec 14, 2024

@yizhang2077 yizhang2077 force-pushed the adapt-tensorrt-llm-custom-allreduce branch from d03081f to ce283ff Compare December 14, 2024 16:29
@zhyncs
Copy link
Member

zhyncs commented Dec 15, 2024

/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/utils.py:505: ResourceWarning: unclosed file <_io.TextIOWrapper name='/sys/fs/cgroup/cpu.max' mode='r' encoding='UTF-8'>
  max_file = open(cpu_max_file_name).read()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:46:46,624	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 26145 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:46:58,163	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 37520 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:47:11,663	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 49014 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:47:28,183	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 60629 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
./usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:47:48,275	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=72585) test_size = 512, world_size = 2, vllm time = 0.0104us,custom time = 0.0074us
(performance pid=72585) test_size = 4096, world_size = 2, vllm time = 0.0096us,custom time = 0.0077us
(performance pid=72585) test_size = 32768, world_size = 2, vllm time = 0.0083us,custom time = 0.0080us
(performance pid=72585) test_size = 262144, world_size = 2, vllm time = 0.0123us,custom time = 0.0122us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 72382 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:48:00,433	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=83971) test_size = 512, world_size = 4, vllm time = 0.0085us,custom time = 0.0074us
(performance pid=83971) test_size = 4096, world_size = 4, vllm time = 0.0079us,custom time = 0.0087us
(performance pid=83971) test_size = 32768, world_size = 4, vllm time = 0.0091us,custom time = 0.0095us
(performance pid=83971) test_size = 131072, world_size = 4, vllm time = 0.0167us,custom time = 0.0140us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 83754 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:48:13,778	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=95444) test_size = 512, world_size = 6, vllm time = 0.0098us,custom time = 0.0080us
(performance pid=95444) test_size = 4096, world_size = 6, vllm time = 0.0089us,custom time = 0.0104us
(performance pid=95444) test_size = 32768, world_size = 6, vllm time = 0.0103us,custom time = 0.0119us
(performance pid=95444) test_size = 65536, world_size = 6, vllm time = 0.0181us,custom time = 0.0141us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 95244 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:48:28,990	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=107061) test_size = 512, world_size = 8, vllm time = 0.0094us,custom time = 0.0079us
(performance pid=107061) test_size = 4096, world_size = 8, vllm time = 0.0089us,custom time = 0.0111us
(performance pid=107061) test_size = 32768, world_size = 8, vllm time = 0.0113us,custom time = 0.0124us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 106853 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.
----------------------------------------------------------------------
Ran 2 tests in 119.998s

OK

@zhyncs
Copy link
Member

zhyncs commented Dec 15, 2024

(performance pid=72585) test_size = 512, world_size = 2, vllm time = 0.0104us,custom time = 0.0074us
(performance pid=72585) test_size = 4096, world_size = 2, vllm time = 0.0096us,custom time = 0.0077us
(performance pid=72585) test_size = 32768, world_size = 2, vllm time = 0.0083us,custom time = 0.0080us
(performance pid=72585) test_size = 262144, world_size = 2, vllm time = 0.0123us,custom time = 0.0122us

(performance pid=83971) test_size = 512, world_size = 4, vllm time = 0.0085us,custom time = 0.0074us
(performance pid=83971) test_size = 4096, world_size = 4, vllm time = 0.0079us,custom time = 0.0087us
(performance pid=83971) test_size = 32768, world_size = 4, vllm time = 0.0091us,custom time = 0.0095us
(performance pid=83971) test_size = 131072, world_size = 4, vllm time = 0.0167us,custom time = 0.0140us

(performance pid=95444) test_size = 512, world_size = 6, vllm time = 0.0098us,custom time = 0.0080us
(performance pid=95444) test_size = 4096, world_size = 6, vllm time = 0.0089us,custom time = 0.0104us
(performance pid=95444) test_size = 32768, world_size = 6, vllm time = 0.0103us,custom time = 0.0119us
(performance pid=95444) test_size = 65536, world_size = 6, vllm time = 0.0181us,custom time = 0.0141us

(performance pid=107061) test_size = 512, world_size = 8, vllm time = 0.0094us,custom time = 0.0079us
(performance pid=107061) test_size = 4096, world_size = 8, vllm time = 0.0089us,custom time = 0.0111us
(performance pid=107061) test_size = 32768, world_size = 8, vllm time = 0.0113us,custom time = 0.0124us

@yizhang2077
Copy link
Collaborator Author

there are some strange logs raised by ray and I don't know how to close it.... But it seems it will not affect the test

@zhyncs
Copy link
Member

zhyncs commented Dec 15, 2024

Test size 4096 is worse than vLLM, other cases are better.

@zhyncs zhyncs merged commit e04d3f2 into main Dec 15, 2024
3 checks passed
@zhyncs zhyncs deleted the adapt-tensorrt-llm-custom-allreduce branch December 15, 2024 05:16
@zhyncs
Copy link
Member

zhyncs commented Dec 15, 2024

Hi @HaiShaw ^ If we replace the current implementation of custom all reduce in SGLang with the trt llm custom all reduce implementation from sgl-kernel, will it affect ROCm?

@merrymercy
Copy link
Contributor

  1. The most important case we care about is TP=8 and bs in [1, 1024]. The size is about 0 - 32MB. Can we do a more comprehensive test?
  2. Why is the kernel slower than vllm one in some cases?
  3. Can you correct the messages in sgl-kernel/tests/test_trt_reduce.py? Print the size in bytes and the us should be ms.

@zhyncs
Copy link
Member

zhyncs commented Dec 17, 2024

The third one has been fixed with #2487

We can add more unit tests to identify why it's slower in some cases. cc @yizhang2077

@yizhang2077
Copy link
Collaborator Author

yizhang2077 commented Dec 17, 2024

  1. The most important case we care about is TP=8 and bs in [1, 1024]. The size is about 0 - 32MB. Can we do a more comprehensive test?
  2. Why is the kernel slower than vllm one in some cases?
  3. Can you correct the messages in sgl-kernel/tests/test_trt_reduce.py? Print the size in bytes and the us should be ms.

For second, I think the most likely reason is multi_gpu_barrier in trt llm is more coarse-grained than vllm, while vllm need use two barrier. trtllm also provide PUSH_MODE to use fine-grained barrier (however it need additional copy to shared buffer) and we can try it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants