sgl-kernel adapt tensorrt llm custom allreduce #2481

yizhang2077 · 2024-12-14T13:37:08Z

Motivation

move tensorrt custom allreduce algoithm to sgl-kernel, make adaption for python, add test for custom allreduce
we do not use twoshot allreduce kernel from tensorrt llm since it is disabled here

Modifications

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/utils.py:505: ResourceWarning: unclosed file <_io.TextIOWrapper name='/sys/fs/cgroup/cpu.max' mode='r' encoding='UTF-8'>
  max_file = open(cpu_max_file_name).read()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-45_313122_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:46:46,624	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 26145 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-46-56_870131_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:46:58,163	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 37520 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-10_427084_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:47:11,663	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 49014 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-26_835903_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:47:28,183	INFO worker.py:1821 -- Started a local Ray instance.
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 60629 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
./usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-46_932999_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:47:48,275	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=72585) test_size = 512, world_size = 2, vllm time = 0.0104us,custom time = 0.0074us
(performance pid=72585) test_size = 4096, world_size = 2, vllm time = 0.0096us,custom time = 0.0077us
(performance pid=72585) test_size = 32768, world_size = 2, vllm time = 0.0083us,custom time = 0.0080us
(performance pid=72585) test_size = 262144, world_size = 2, vllm time = 0.0123us,custom time = 0.0122us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 72382 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-47-59_143408_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:48:00,433	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=83971) test_size = 512, world_size = 4, vllm time = 0.0085us,custom time = 0.0074us
(performance pid=83971) test_size = 4096, world_size = 4, vllm time = 0.0079us,custom time = 0.0087us
(performance pid=83971) test_size = 32768, world_size = 4, vllm time = 0.0091us,custom time = 0.0095us
(performance pid=83971) test_size = 131072, world_size = 4, vllm time = 0.0167us,custom time = 0.0140us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 83754 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-12_487014_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:48:13,778	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=95444) test_size = 512, world_size = 6, vllm time = 0.0098us,custom time = 0.0080us
(performance pid=95444) test_size = 4096, world_size = 6, vllm time = 0.0089us,custom time = 0.0104us
(performance pid=95444) test_size = 32768, world_size = 6, vllm time = 0.0103us,custom time = 0.0119us
(performance pid=95444) test_size = 65536, world_size = 6, vllm time = 0.0181us,custom time = 0.0141us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 95244 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/gcs_server.out' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1362: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/gcs_server.err' mode='a' encoding='utf-8'>
  self.start_gcs_server()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/monitor.out' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1367: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/monitor.err' mode='a' encoding='utf-8'>
  self.start_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1378: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/dashboard.err' mode='a' encoding='utf-8'>
  self.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/raylet.out' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1420: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/raylet.err' mode='a' encoding='utf-8'>
  self.start_raylet(plasma_directory, object_store_memory)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/local/lib/python3.10/dist-packages/ray/_private/node.py:1422: ResourceWarning: unclosed file <_io.TextIOWrapper name='/tmp/ray/session_2024-12-14_20-48-27_700625_25865/logs/log_monitor.err' mode='a' encoding='utf-8'>
  self.start_log_monitor()
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2024-12-14 20:48:28,990	INFO worker.py:1821 -- Started a local Ray instance.
(performance pid=107061) test_size = 512, world_size = 8, vllm time = 0.0094us,custom time = 0.0079us
(performance pid=107061) test_size = 4096, world_size = 8, vllm time = 0.0089us,custom time = 0.0111us
(performance pid=107061) test_size = 32768, world_size = 8, vllm time = 0.0113us,custom time = 0.0124us
/usr/lib/python3.10/subprocess.py:1072: ResourceWarning: subprocess 106853 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
.
----------------------------------------------------------------------
Ran 2 tests in 119.998s

OK

zhyncs · 2024-12-15T04:50:24Z

(performance pid=72585) test_size = 512, world_size = 2, vllm time = 0.0104us,custom time = 0.0074us
(performance pid=72585) test_size = 4096, world_size = 2, vllm time = 0.0096us,custom time = 0.0077us
(performance pid=72585) test_size = 32768, world_size = 2, vllm time = 0.0083us,custom time = 0.0080us
(performance pid=72585) test_size = 262144, world_size = 2, vllm time = 0.0123us,custom time = 0.0122us

(performance pid=83971) test_size = 512, world_size = 4, vllm time = 0.0085us,custom time = 0.0074us
(performance pid=83971) test_size = 4096, world_size = 4, vllm time = 0.0079us,custom time = 0.0087us
(performance pid=83971) test_size = 32768, world_size = 4, vllm time = 0.0091us,custom time = 0.0095us
(performance pid=83971) test_size = 131072, world_size = 4, vllm time = 0.0167us,custom time = 0.0140us

(performance pid=95444) test_size = 512, world_size = 6, vllm time = 0.0098us,custom time = 0.0080us
(performance pid=95444) test_size = 4096, world_size = 6, vllm time = 0.0089us,custom time = 0.0104us
(performance pid=95444) test_size = 32768, world_size = 6, vllm time = 0.0103us,custom time = 0.0119us
(performance pid=95444) test_size = 65536, world_size = 6, vllm time = 0.0181us,custom time = 0.0141us

(performance pid=107061) test_size = 512, world_size = 8, vllm time = 0.0094us,custom time = 0.0079us
(performance pid=107061) test_size = 4096, world_size = 8, vllm time = 0.0089us,custom time = 0.0111us
(performance pid=107061) test_size = 32768, world_size = 8, vllm time = 0.0113us,custom time = 0.0124us

yizhang2077 · 2024-12-15T04:51:51Z

there are some strange logs raised by ray and I don't know how to close it.... But it seems it will not affect the test

zhyncs · 2024-12-15T04:51:59Z

Test size 4096 is worse than vLLM, other cases are better.

zhyncs · 2024-12-15T06:09:51Z

Hi @HaiShaw ^ If we replace the current implementation of custom all reduce in SGLang with the trt llm custom all reduce implementation from sgl-kernel, will it affect ROCm?

merrymercy · 2024-12-16T22:00:10Z

The most important case we care about is TP=8 and bs in [1, 1024]. The size is about 0 - 32MB. Can we do a more comprehensive test?
Why is the kernel slower than vllm one in some cases?
Can you correct the messages in sgl-kernel/tests/test_trt_reduce.py? Print the size in bytes and the us should be ms.

zhyncs · 2024-12-17T07:43:43Z

The third one has been fixed with #2487

We can add more unit tests to identify why it's slower in some cases. cc @yizhang2077

yizhang2077 · 2024-12-17T17:23:40Z

The most important case we care about is TP=8 and bs in [1, 1024]. The size is about 0 - 32MB. Can we do a more comprehensive test?

Why is the kernel slower than vllm one in some cases?

Can you correct the messages in sgl-kernel/tests/test_trt_reduce.py? Print the size in bytes and the us should be ms.

For second, I think the most likely reason is multi_gpu_barrier in trt llm is more coarse-grained than vllm, while vllm need use two barrier. trtllm also provide PUSH_MODE to use fine-grained barrier (however it need additional copy to shared buffer) and we can try it.

yizhang2077 requested review from merrymercy, Ying1123, zhyncs, ispobock, hnyls2002 and ByronHsu as code owners December 14, 2024 13:37

github-advanced-security bot found potential problems Dec 14, 2024

View reviewed changes

test/srt/test_custom_all_reduce.py Fixed Show fixed Hide fixed

test/srt/test_custom_all_reduce.py Fixed Show fixed Hide fixed

yizhang2077 assigned zhyncs Dec 14, 2024

yizhang2077 force-pushed the adapt-tensorrt-llm-custom-allreduce branch from 4982d53 to 69df322 Compare December 14, 2024 14:15

yizhang2077 changed the title ~~Adapt tensorrt llm custom allreduce~~ sgl-kernel adapt tensorrt llm custom allreduce Dec 14, 2024

zhyncs reviewed Dec 14, 2024

View reviewed changes

sgl-kernel/pyproject.toml Outdated Show resolved Hide resolved

zhyncs reviewed Dec 14, 2024

View reviewed changes

yizhang2077 added 3 commits December 14, 2024 23:34

sgl-kernel support tensorrt llm custom allreduce

1f18dd3

modify sgl-kernel version

9bc795d

format code

3a0b36a

zhyncs added the high priority label Dec 14, 2024

add utils.hpp for basic check

ce283ff

yizhang2077 force-pushed the adapt-tensorrt-llm-custom-allreduce branch from d03081f to ce283ff Compare December 14, 2024 16:29

yizhang2077 and others added 4 commits December 15, 2024 11:59

add test for custom allreduce in sgl-kernel

01fe99e

Merge branch 'main' into adapt-tensorrt-llm-custom-allreduce

48dba65

fix different world_size test_sizes

a4eb617

upd

f3afd7b

upd

bf3ed13

zhyncs approved these changes Dec 15, 2024

View reviewed changes

zhyncs merged commit e04d3f2 into main Dec 15, 2024
3 checks passed

zhyncs deleted the adapt-tensorrt-llm-custom-allreduce branch December 15, 2024 05:16

zhyncs mentioned this pull request Dec 18, 2024

adapt custom allreduce for tensorrt llm #2511

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sgl-kernel adapt tensorrt llm custom allreduce #2481

sgl-kernel adapt tensorrt llm custom allreduce #2481

yizhang2077 commented Dec 14, 2024 •

edited

Loading

zhyncs commented Dec 14, 2024

yizhang2077 commented Dec 14, 2024

zhyncs commented Dec 14, 2024

zhyncs commented Dec 14, 2024

zhyncs commented Dec 15, 2024

zhyncs commented Dec 15, 2024

yizhang2077 commented Dec 15, 2024

zhyncs commented Dec 15, 2024

zhyncs commented Dec 15, 2024

merrymercy commented Dec 16, 2024

zhyncs commented Dec 17, 2024

yizhang2077 commented Dec 17, 2024 •

edited

Loading

sgl-kernel adapt tensorrt llm custom allreduce #2481

sgl-kernel adapt tensorrt llm custom allreduce #2481

Conversation

yizhang2077 commented Dec 14, 2024 • edited Loading

Motivation

Modifications

Checklist

Next

zhyncs commented Dec 14, 2024

yizhang2077 commented Dec 14, 2024

zhyncs commented Dec 14, 2024

zhyncs commented Dec 14, 2024

zhyncs commented Dec 15, 2024

zhyncs commented Dec 15, 2024

yizhang2077 commented Dec 15, 2024

zhyncs commented Dec 15, 2024

zhyncs commented Dec 15, 2024

merrymercy commented Dec 16, 2024

zhyncs commented Dec 17, 2024

yizhang2077 commented Dec 17, 2024 • edited Loading

yizhang2077 commented Dec 14, 2024 •

edited

Loading

yizhang2077 commented Dec 17, 2024 •

edited

Loading