vllm.lora.lora_weights ¶
LoRALayerWeights ¶
LoRA weights for a layer composed of two low rank matrixes.
Source code in vllm/lora/lora_weights.py
__init__ ¶
__init__(
module_name: str,
rank: int,
lora_alpha: int,
lora_a: Tensor,
lora_b: Tensor,
scaling: float | None = None,
) -> None
Source code in vllm/lora/lora_weights.py
create_dummy_lora_weights classmethod ¶
create_dummy_lora_weights(
module_name: str,
input_dim: int,
output_dim: int,
rank: int,
dtype: dtype,
device: Device,
) -> LoRALayerWeights
Source code in vllm/lora/lora_weights.py
from_config classmethod ¶
from_config(
module_name: str, peft_helper: PEFTHelper
) -> LoRALayerWeights
Source code in vllm/lora/lora_weights.py
PackedLoRALayerWeights ¶
Bases: LoRALayerWeights
LoRA used for packed layers (eg. qkv_proj).
Source code in vllm/lora/lora_weights.py
__init__ ¶
__init__(
module_name: str,
rank: int,
lora_alphas: list[int | None],
lora_a: list[Tensor | None],
lora_b: list[Tensor | None],
scaling: list[float] | None = None,
) -> None
Source code in vllm/lora/lora_weights.py
optimize ¶
optimize() -> PackedLoRALayerWeights
Optimize the LoRA by merging the scaling into lora_b.
Source code in vllm/lora/lora_weights.py
pack classmethod ¶
pack(
loras: Sequence[Optional[LoRALayerWeights]],
) -> PackedLoRALayerWeights
Pack a list of LoRAs into a single LoRA.
If LoRA is None, it signifies that the submodule does not have a LoRA.