Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix #11566

Merged
merged 13 commits into from
Dec 27, 2024
7 changes: 5 additions & 2 deletions vllm/model_executor/model_loader/loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -995,8 +995,11 @@ def _get_bnb_target_modules(self, model: nn.Module) -> None:
for sub_name in sub_modules:
self.target_modules.append(
name.replace(last_name, sub_name))
else:
self.target_modules.append(name)
# Add original module name even if the module has stacked map,
# in case model has a mixture of disk-merged and disk-splitted
# weights with same last name.
self.target_modules.append(name)

assert (self.target_modules
), "vllm currently does not support BNB quantization for"
f" {type(model).__name__}"
Expand Down
Loading