11. Hugging Face Trainer
The Hugging Face Trainer class provides a complete training loop for fine-tuning models on Hugging Face datasets. It handles gradient accumulation, checkpoint saving, logging, evaluation, and device management with minimal configuration.
Instantiating a Trainer requires three core components: a model (with LoRA adapters applied), a training arguments configuration, and a dataset prepared with appropriate tokenization. Optional components include evaluation datasets, compute metrics functions, and callbacks for custom behavior.
The model passed to Trainer should already have LoRA adapters configured. The PEFT library provides get_peft_model which wraps a base model with LoRA configuration, returning a model ready for Trainer. The trainer only updates the LoRA parameters; all other parameters remain frozen.
Dataset preparation requires tokenizing input text and computing labels. For instruction-tuning, the label computation typically masks non-response tokens, calculating loss only on the target response portion. This focus accelerates learning and reduces unintended behavior modification.
Gradient checkpointing reduces memory consumption at the cost of additional compute. When enabled, activations are recomputed during backward pass rather than stored during forward pass. This trades speed for memory, enabling larger batch sizes or longer sequences.
Mixed precision training (fp16 or bf16) reduces memory for forward and backward passes while maintaining numerical stability for most fine-tuning tasks. bf16 offers a wider dynamic range than fp16, making it preferable for training stability. The Trainer handles automatic device placement and precision conversion.
Configure and initialize a complete training setup using Hugging Face Trainer with PEFT LoRA. Set up all required components and verify the model has the correct trainable parameter count.
# trainer_setup.py
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, TaskType
from datasets import load_dataset
import torch
def setup_lora_training(
model_name: str,
output_dir: str = "./lora_output",
rank: int = 8,
learning_rate: float = 3e-4,
num_epochs: int = 3,
batch_size: int = 4,
gradient_accumulation_steps: int = 4
):
"""Set up complete LoRA fine-tuning with Hugging Face Trainer."""
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
# Load model with QLoRA config for memory efficiency
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Configure LoRA
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=rank,
lora_alpha=2 * rank, # Scaling factor
lora_dropout=0.05,
target_modules=["q_proj", "v_proj"], # Default for most models
bias="none"
)
# Apply LoRA to model
model = get_peft_model(model, lora_config)
# Print trainable vs total parameters
model.print_trainable_parameters()
# Load and tokenize dataset
dataset = load_dataset("yahma/alpaca-cleaned", split="train")
dataset = dataset.select(range(min(1000, len(dataset)))) # Subset for demo
def tokenize_function(examples):
# Format as instruction tuning
formatted = []
for instruction, input_text, output in zip(
examples["instruction"],
examples["input"],
examples["output"]
):
text = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n{output}"
formatted.append(text)
return tokenizer(
formatted,
truncation=True,
max_length=512,
padding="max_length"
)
tokenized_dataset = dataset.map(
tokenize_function,
batched=True,
remove_columns=dataset.column_names
)
# Data collator
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False # Causal LM, not masked
)
# Training arguments
training_args = TrainingArguments(
output_dir=output_dir,
num_train_epochs=num_epochs,
per_device_train_batch_size=batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
learning_rate=learning_rate,
fp16=False,
bf16=True, # Use bf16 for stability
logging_steps=10,
save_strategy="epoch",
save_total_limit=2,
report_to="none",
warmup_steps=10,
lr_scheduler_type="cosine"
)
# Initialize trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset,
data_collator=data_collator
)
return trainer, model, tokenizer
# Verify setup
def verify_trainable_params(model):
"""Verify only LoRA parameters are trainable."""
trainable_params = 0
all_params = 0
for name, param in model.named_parameters():
all_params += param.numel()
if param.requires_grad:
trainable_params += param.numel()
print(f"Trainable parameters: {trainable_params:,}")
print(f"All parameters: {all_params:,}")
print(f"Trainable percentage: {100 * trainable_params / all_params:.2f}%")
return {
"trainable": trainable_params,
"total": all_params,
"percentage": 100 * trainable_params / all_params
}