[core] improve cpu offloading implementation #10609

youkaichao · 2024-11-24T21:25:07Z

make it friendly with torch.compile

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-11-24T21:25:19Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-11-24T21:31:25Z

it seems the original cuda tensor is still held alive somewhere. the weights are not offloaded actually.

Signed-off-by: youkaichao <[email protected]>

bdhirsh · 2024-11-25T21:16:24Z

vllm/model_executor/models/utils.py


-    return module
+    torch.empty = fake_empty


One thing to call out is that the monkey patching here will allow you to override torch.empty, but not any at::empty() calls that come from C++ anywhere in the dispatcher. I'm not sure if the particular code you're running is actually running into this, but the way we normally handle "factory functions that you want to override to return tensor subclasses" is with a TorchDispatchMode:

from torch.utils._python_dispatch import TorchDispatchMode class OffloadedTensorMode(TorchDispatchMode): def __torch_dispatch__(self, func, types, args=(), kwargs=None): if kwargs is None: kwargs = {} rs = func(*args, **kwargs) if func is torch.ops.aten.empty.memory_format and rs.device != "cpu" and ...: rs = OffloadedTensor(rs) return rs

this is very helpful!

bdhirsh · 2024-11-25T21:20:37Z

vllm/model_executor/models/utils.py

+        if requires_grad is None:
+            return super().__new__(cls, elem)
+        else:
+            return cls._make_subclass(cls, elem, requires_grad)


I think if your tensor subclass internally holds another tensor (elem here), you probably want to user the "wrapper" subclass API. Example here

out_tensor = torch.Tensor._make_wrapper_subclass(cls, shape, **kwargs) out_tensor.elem = weak_ref_tensor(elem)

Side note: I would probably call that constructor unconditionally, any reason you aren't doing it when requires_grad is None?

I copy the code from https://github.com/albanD/subclass_zoo/blob/ec47458346c2a1cfcd5e676926a4bbc6709ff62e/base_tensor.py#L11

… use tree_map && add handle aten.uniform_.default && rm handle aten.slice.Tensor &&

youkaichao · 2024-12-27T12:21:21Z

vllm/model_executor/models/utils.py

+class OffloadedTensor(torch.Tensor):
+


need to have __new__ , similar to https://github.com/albanD/subclass_zoo/blob/ec47458346c2a1cfcd5e676926a4bbc6709ff62e/base_tensor.py#L11

we might also need https://github.com/albanD/subclass_zoo/blob/ec47458346c2a1cfcd5e676926a4bbc6709ff62e/base_tensor.py#L25

We do generally have support for subclasses that implement both torch_function and torch_dispatch, although if you only need torch_dispatch then I agree that you probably want to disable torch_function as linked above.

Let me know if you have any other questions / would like to chat more about the subclass work you're doing!

youkaichao · 2024-12-27T12:22:17Z

vllm/model_executor/models/utils.py

+        tensor = func(*args, **kwargs)
+
+        if (func is torch.ops.aten.empty.memory_format
+                and tensor.device != "cpu"):


maybe use torch.device("cpu") instead of "cpu"?

hmm tensor.device != "cpu should generally be ok

…aw nn.Parameter

mergify · 2024-12-29T12:17:59Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @youkaichao.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

improve implementation

c00a93a

Signed-off-by: youkaichao <[email protected]>

youkaichao added 10 commits November 24, 2024 13:44

change to apply

792650f

Signed-off-by: youkaichao <[email protected]>

change to apply

c568a55

Signed-off-by: youkaichao <[email protected]>

fix None

a874918

Signed-off-by: youkaichao <[email protected]>

fix detach

6e4f195

Signed-off-by: youkaichao <[email protected]>

use empty

5398836

Signed-off-by: youkaichao <[email protected]>

fix inplace

17be34f

Signed-off-by: youkaichao <[email protected]>

add slice

2dead95

Signed-off-by: youkaichao <[email protected]>

fix ref

2e5f95c

Signed-off-by: youkaichao <[email protected]>

fix uniform_

f2dd4a8

Signed-off-by: youkaichao <[email protected]>

check mutable

b2c195c

Signed-off-by: youkaichao <[email protected]>

youkaichao mentioned this pull request Nov 25, 2024

tracking torch.compile compatibility with cpu offloading #10612

Open

1 task

bdhirsh reviewed Nov 25, 2024

View reviewed changes

cennn added 7 commits December 20, 2024 14:46

merge latest main

ad74693

rm git

a737dbb

use OffloadedTensorMode rm maybe_offload_to_cpu && in OffloadedTensor…

608bfaf

… use tree_map && add handle aten.uniform_.default && rm handle aten.slice.Tensor &&

bash format.sh

89a2190

rm __new__

08d12f5

rm unused weak_ref_tensor

46c0d98

bash format.sh

f2e05ce

youkaichao commented Dec 27, 2024

View reviewed changes

replace parameter subclasses by using construct_xxx and wrap_xxx on r…

3b14773

…aw nn.Parameter

mergify bot added the needs-rebase label Dec 29, 2024

cennn mentioned this pull request Dec 30, 2024

[Quantization/Parameter] WIP: Replace parameter subclasses with raw nn.Parameter with additional attributes #11622

Open

youkaichao mentioned this pull request Jan 8, 2025

[Bugfix] Add checks for LoRA and CPU offload #11810

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] improve cpu offloading implementation #10609

[core] improve cpu offloading implementation #10609

youkaichao commented Nov 24, 2024

github-actions bot commented Nov 24, 2024

youkaichao commented Nov 24, 2024

bdhirsh Nov 25, 2024

youkaichao Nov 25, 2024

bdhirsh Nov 25, 2024

youkaichao Nov 25, 2024

youkaichao Dec 27, 2024

bdhirsh Dec 27, 2024

youkaichao Dec 27, 2024

bdhirsh Dec 27, 2024

mergify bot commented Dec 29, 2024

[core] improve cpu offloading implementation #10609

Are you sure you want to change the base?

[core] improve cpu offloading implementation #10609

Conversation

youkaichao commented Nov 24, 2024

github-actions bot commented Nov 24, 2024

youkaichao commented Nov 24, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Dec 29, 2024