[V1][Frontend] Improve Shutdown And Logs #11737

robertgshaw2-redhat · 2025-01-04T14:48:39Z

UPDATE: putting this back into WIP state. Found a couple issues with TP as I built out test cases that require more.

SUMMARY:

Prior to this PR, if we encountered an error in a background process, we would kill the whole process tree immediately, which means that we cannot cleanup resources and cannot return good status codes to clients. This PR overhauls the Error handling to instead gracefully shut down the background processes and raise Errors that allow us to return proper HTTP status codes to users

In this work, I uncovered an issue with how we were handling Exceptions during streaming. This is fixed in a separate PR (#11752)

TODO:

Add automation testing

Signed-off-by: [email protected] <[email protected]>

… handle properly Signed-off-by: [email protected] <[email protected]>

github-actions · 2025-01-04T14:48:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

mergify · 2025-01-04T14:49:14Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-neuralmagic.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllm/entrypoints/openai/serving_completion.py

robertgshaw2-redhat · 2025-01-04T17:23:03Z

Here is what the server logs look like for:

TP=2, 1000 concurrent streaming requests
Simulate illegal memory access on RANK 1 after 200 steps of the engine

...
INFO:     127.0.0.1:45354 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45360 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45368 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45372 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45388 - "POST /v1/completions HTTP/1.1" 200 OK
INFO:     127.0.0.1:45394 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 01-04 17:21:02 core.py:247] RUNNING: 306 | WAITING: 628
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401] WorkerProc hit an exception: %s
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401] Traceback (most recent call last):
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 397, in worker_busy_loop
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     output = getattr(self.worker, method)(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/v1/worker/gpu_worker.py", line 204, in execute_model
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     output = self.model_runner.execute_model(scheduler_output)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/v1/worker/gpu_model_runner.py", line 615, in execute_model
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     hidden_states = self.model(
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]                     ^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return self._call_impl(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     return forward_call(*args, **kwargs)
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]   File "/home/rshaw/vllm/vllm/model_executor/models/llama.py", line 571, in forward
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401]     raise RuntimeError("ERROR IN LLAMA!")
(VllmWorker rank=0 pid=1068781) ERROR 01-04 17:21:04 multiproc_executor.py:401] RuntimeError: ERROR IN LLAMA!
ERROR 01-04 17:21:04 core.py:200] EngineCore hit an exception: Traceback (most recent call last):
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 193, in run_engine_core
ERROR 01-04 17:21:04 core.py:200]     engine_core.run_busy_loop()
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 231, in run_busy_loop
ERROR 01-04 17:21:04 core.py:200]     outputs = self.step()
ERROR 01-04 17:21:04 core.py:200]               ^^^^^^^^^^^
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/engine/core.py", line 124, in step
ERROR 01-04 17:21:04 core.py:200]     output = self.model_executor.execute_model(scheduler_output)
ERROR 01-04 17:21:04 core.py:200]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 167, in execute_model
ERROR 01-04 17:21:04 core.py:200]     model_output = self.collective_rpc("execute_model",
ERROR 01-04 17:21:04 core.py:200]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 161, in collective_rpc
ERROR 01-04 17:21:04 core.py:200]     raise e
ERROR 01-04 17:21:04 core.py:200]   File "/home/rshaw/vllm/vllm/v1/executor/multiproc_executor.py", line 150, in collective_rpc
ERROR 01-04 17:21:04 core.py:200]     raise result
ERROR 01-04 17:21:04 core.py:200] RuntimeError: ERROR IN LLAMA!
ERROR 01-04 17:21:04 core.py:200] 
CRITICAL 01-04 17:21:04 async_llm.py:65] AsyncLLM got fatal signal from worker process, shutting down. See stack trace for root cause.
CRITICAL 01-04 17:21:05 launcher.py:91] Engine failed, terminating server.
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1067793]

DarkLight1337 · 2025-01-05T02:08:09Z

vllm/v1/engine/exceptions.py

+    pass
+
+
+def engine_dead_error_guard(func):


We should type hint this decorator, otherwise the decorated functions will not be type checked.

vllm/entrypoints/launcher.py

DarkLight1337 · 2025-01-05T13:06:43Z

vllm/v1/engine/exceptions.py

@@ -0,0 +1,43 @@
+# Raised when a AsyncLLM.generate() fails. Possibly recoverable.


Can we make these docstrings instead of code comments?

mergify · 2025-01-07T01:08:38Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @robertgshaw2-neuralmagic.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

robertgshaw2-redhat added 3 commits January 3, 2025 23:11

checkpoint prototype

eb16239

Signed-off-by: [email protected] <[email protected]>

Issue currently is with streaming. The HTTP exception handlers do not…

8549fdd

… handle properly Signed-off-by: [email protected] <[email protected]>

switch from ValueError -> Exception.

77801cd

mergify bot added the frontend label Jan 4, 2025

mergify bot added the needs-rebase label Jan 4, 2025

robertgshaw2-redhat commented Jan 4, 2025

View reviewed changes

vllm/entrypoints/openai/serving_completion.py Outdated Show resolved Hide resolved

robertgshaw2-redhat added 2 commits January 4, 2025 14:54

merged

1bbc3a4

updated

8eca864

mergify bot removed the needs-rebase label Jan 4, 2025

robertgshaw2-redhat added 3 commits January 4, 2025 15:09

stash

b8c77b3

stash

ce9b8ef

add watchdog

3a760a7

robertgshaw2-redhat marked this pull request as ready for review January 4, 2025 16:29

robertgshaw2-redhat requested review from WoosukKwon, njhill, ywang96, comaniac and alexm-redhat as code owners January 4, 2025 16:29

robertgshaw2-redhat changed the title ~~[Frontend] Improve API Server Error Messages~~ [Frontend] Improve API Server Error Logs Jan 4, 2025

robertgshaw2-redhat changed the title ~~[Frontend] Improve API Server Error Logs~~ [V1][Frontend] Improve Error Handling Shutdown And Logs Jan 4, 2025

robertgshaw2-redhat added 7 commits January 4, 2025 16:41

updated

3024da0

revert spurious changes

5af8189

updated

3cb21bb

updated

7c97308

updated

ea6824a

remove cruft

b278065

cruft

c004bd4

robertgshaw2-redhat added 5 commits January 4, 2025 23:35

updated

1188845

udpatd

706782c

added exception file

1cc0915

updated

8db0eee

fixt

2fc8af6

DarkLight1337 reviewed Jan 5, 2025

View reviewed changes

vllm/entrypoints/launcher.py Show resolved Hide resolved

DarkLight1337 reviewed Jan 5, 2025

View reviewed changes

robertgshaw2-redhat added 6 commits January 5, 2025 17:10

reduce cruft

de39af1

reduce cruft

732ba64

cleanup

4372094

updated

b9144a3

cruft

d90e122

updated

2bbac31

robertgshaw2-redhat mentioned this pull request Jan 5, 2025

[Frontend] Improve StreamingResponse Exception Handling #11752

Merged

robertgshaw2-redhat changed the title ~~[V1][Frontend] Improve Error Handling Shutdown And Logs~~ [V1][Frontend] Improve Shutdown And Logs Jan 5, 2025

robertgshaw2-redhat added 12 commits January 5, 2025 19:11

revert changes to server

c40542a

revert debug cruft

46734eb

fix error

f0baffb

added tests

8a7f18e

revert

a662940

fixed

4ee6390

updated

3e23ee2

fixed error

45456f9

update test coverage

6128b1a

stash

de24559

added tests

7adf26e

stash

bf92854

mergify bot added the needs-rebase label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[V1][Frontend] Improve Shutdown And Logs #11737

[V1][Frontend] Improve Shutdown And Logs #11737

robertgshaw2-redhat commented Jan 4, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 4, 2025

mergify bot commented Jan 4, 2025

robertgshaw2-redhat commented Jan 4, 2025

DarkLight1337 Jan 5, 2025

DarkLight1337 Jan 5, 2025

mergify bot commented Jan 7, 2025

		@@ -0,0 +1,43 @@
		# Raised when a AsyncLLM.generate() fails. Possibly recoverable.

[V1][Frontend] Improve Shutdown And Logs #11737

Are you sure you want to change the base?

[V1][Frontend] Improve Shutdown And Logs #11737

Conversation

robertgshaw2-redhat commented Jan 4, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 4, 2025

mergify bot commented Jan 4, 2025

robertgshaw2-redhat commented Jan 4, 2025

DarkLight1337 Jan 5, 2025

Choose a reason for hiding this comment

DarkLight1337 Jan 5, 2025

Choose a reason for hiding this comment

mergify bot commented Jan 7, 2025

robertgshaw2-redhat commented Jan 4, 2025 •

edited by github-actions bot

Loading