16. Evaluation

Chapter 16 of 24 · 20 min
EXERCISE

: Implement Task-Specific Evaluation

Create a custom evaluation function for text classification that computes per-class precision and recall:

def evaluate_classification(model, dataloader, id2label, device):
    """Returns per-class metrics."""
    from collections import defaultdict
    
    model.eval()
    predictions = []
    references = []
    
    with torch.no_grad():
        for batch in dataloader:
            # Implementation here
            pass
    
    # Compute confusion matrix and derive metrics
    from sklearn.metrics import classification_report
    print(classification_report(references, predictions, target_names=id2label.values()))