You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
self.output_dim should be changed to self.output_size
classColumnParallelLinearWithLoRA(BaseLinearLayerWithLoRA):
""" LoRA on top of ColumnParallelLinear layer. LoRA B is sliced for tensor parallelism. There are two types for the `base_layer`: 1. ColumnParallelLinear, e.g.`dense_h_to_4h` in `FalconForCausalLM`. 2. MergedColumnParallelLinear, e.g.`gate_up_proj` in `Phi3ForCausalLM`. """def__init__(self, base_layer: ColumnParallelLinear) ->None:
super().__init__(base_layer)
# The base_layer type is ColumnParallelLinear or# MergedColumnParallelLinear, their weight sharding logic is# inconsistent when TP is greater than 1.self.is_merged_col_linear=type(
base_layer) isMergedColumnParallelLinearself.tp_size=get_tensor_model_parallel_world_size()
self.output_size=self.base_layer.output_size_per_partition# There is only one LoRA layerself.n_slices=1defslice_lora_b(self, lora_b: torch.Tensor) ->torch.Tensor:
# Applicable to cases where the base_layer is# MergedColumnParallelLinear.ifself.is_merged_col_linear:
tp_rank=get_tensor_model_parallel_rank()
shard_size=self.output_size//2offset=lora_b.shape[-1] //2left_weight=lora_b[:, tp_rank*shard_size:(tp_rank+1) *shard_size]
right_weight=lora_b[:, offset+tp_rank*shard_size:offset+
(tp_rank+1) *shard_size]
lora_b=torch.cat([left_weight, right_weight], dim=1)
# Applicable to cases where the base_layer is# ColumnParallelLinear.else:
tensor_model_parallel_rank=get_tensor_model_parallel_rank()
shard_size=self.output_dim# self.output_dim is not definedstart_idx=tensor_model_parallel_rank*shard_sizeend_idx= (tensor_model_parallel_rank+1) *shard_sizelora_b=lora_b[:, start_idx:end_idx]
returnlora_bdefslice_bias(self, bias: torch.Tensor) ->torch.Tensor:
# TODO: Fix the slicing logic of bias.ifbiasisNone:
returnbiastensor_model_parallel_rank=get_tensor_model_parallel_rank()
shard_size=self.output_dim# self.output_dim is not definedstart_idx=tensor_model_parallel_rank*shard_sizeend_idx= (tensor_model_parallel_rank+1) *shard_sizebias=bias[start_idx:end_idx]
returnbias
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
self.output_dim should be changed to self.output_size
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: