05. Target Modules
LoRA can be applied to various weight matrices throughout a transformer architecture, but not all locations contribute equally to model behavior. The choice of target modules affects both the resulting model's capabilities and the parameter efficiency of the adaptation.
For decoder-only transformers like LLaMA, Mistral, and Phi models, the attention projection layers (Q, K, V, and sometimes O) are the primary targets. These projections control how the model attends to and processes token representations. Targeting Q and V matrices while excluding K provides a reasonable balance of parameter efficiency and behavioral impact.
The feed-forward network (FFN) layers, which comprise the majority of parameters in most transformer models, also contribute to fine-tuning effectiveness when included. Adding LoRA to the gate, up, and down projections of FFN layers increases the trainable parameter count substantially but can improve performance on tasks requiring substantial knowledge adaptation.
A common configuration targets attention Q and V projections with rank-8 LoRA while optionally adding rank-4 LoRA to FFN projections. This approach, sometimes called QLoRA with extended targeting, achieves good results with manageable parameter counts.
Different target configurations suit different task types. Classification tasks on pre-existing categories often work well with attention-only targeting. Tasks requiring substantial knowledge injection or stylistic transformation may benefit from FFN targeting as well.
Implementation varies by model architecture. Hugging Face's PEFT library provides model-specific target module configurations. Most open-source models have well-documented recommended target module names. Users should verify the actual module names in their specific model using inspection utilities before configuration.
Inspect a model from Hugging Face to list all attention and FFN module names. Use this information to configure appropriate LoRA targets.
# inspect_targets.py
from transformers import AutoModelForCausalLM, AutoConfig
def list_trainable_modules(model_name: str):
"""List modules suitable for LoRA targeting in a model."""
config = AutoConfig.from_pretrained(model_name)
print(f"Model: {model_name}")
print(f"Hidden size: {config.hidden_size}")
print(f"Intermediate size: {config.intermediate_size}")
print(f"Num layers: {config.num_hidden_layers}")
print(f"Num attention heads: {config.num_attention_heads}")
# Load model to inspect actual modules
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="meta" # Don't load weights
)
attention_modules = []
ffn_modules = []
for name, module in model.named_modules():
if "q_proj" in name or "v_proj" in name:
attention_modules.append(name)
elif "gate_proj" in name or "up_proj" in name or "down_proj" in name:
ffn_modules.append(name)
print(f"\nAttention Q/V modules found: {len(attention_modules)}")
print(f"FFN gate/up/down modules found: {len(ffn_modules)}")
return {
"attention": attention_modules,
"ffn": ffn_modules
}
# Example PEFT config for attention-only targeting
def create_attention_lora_config(rank: int = 8, alpha: int = 16):
return {
"target_modules": ["q_proj", "v_proj"],
"rank": rank,
"lora_alpha": alpha,
"lora_dropout": 0.05,
"bias": "none",
"task_type": "CAUSAL_LM"
}