13. Fairness Metrics
Beyond bias detection, fairness metrics formalize trade-offs between competing desiderata. Understanding these trade-offs is essential for making principled deployment decisions.
Individual vs. Group Fairness
class FairnessMetrics:
"""Collection of fairness metrics for classifier evaluation."""
def __init__(self, predictions, labels, group_membership):
self.predictions = predictions
self.labels = labels
self.groups = group_membership
def individual_fairness(self, similarity_matrix, epsilon=0.1):
"""Similar individuals receive similar predictions."""
violations = 0
for i in range(len(self.predictions)):
for j in range(i + 1, len(self.predictions)):
if similarity_matrix[i, j] > 0.9:
if abs(self.predictions[i] - self.predictions[j]) > epsilon:
violations += 1
return violations
def counterfactual_fairness(self, model, input_features, sensitive_idx):
"""Prediction unchanged when sensitive attribute is flipped."""
original_input = input_features.clone()
flipped_input = input_features.clone()
flipped_input[:, sensitive_idx] = 1 - flipped_input[:, sensitive_idx]
original_pred = model(original_input)
flipped_pred = model(flipped_input)
return torch.abs(original_pred - flipped_pred).mean()
def calibration_error(self, n_bins=10):
"""Predicted probabilities match actual outcomes within bins."""
bin_edges = np.linspace(0, 1, n_bins + 1)
calibration_errors = []
for i in range(n_bins):
mask = (self.predictions >= bin_edges[i]) & \
(self.predictions < bin_edges[i + 1])
if mask.sum() > 0:
observed = self.labels[mask].mean()
predicted = self.predictions[mask].mean()
calibration_errors.append(
mask.sum() * abs(observed - predicted)
)
return sum(calibration_errors) / len(self.labels)
The Impossibility Theorem
The trio of demographic parity, equalized odds, and calibration cannot be simultaneously satisfied when base rates differ across groups. Deployment decisions require explicit trade-offs.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Train a simple classifier on biased synthetic data. Compute multiple fairness metrics and identify which groups experience the greatest disparities under each criterion.