RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /AI Safety and Alignment
  6. /Ch. 12
AI Safety and Alignment

12. Bias Detection

Chapter 12 of 18 · 15 min
KEY INSIGHT

Bias is multidimensional. No single metric captures all fairness concerns, and metrics can conflict. A thorough bias audit combines statistical tests, embedding analysis, and human evaluation.

Bias detection identifies systematic distortions in model outputs. Effective detection requires both automated metrics and qualitative analysis, since automated scores can be gamed or miss context-dependent biases.

Demographic Parity and Equalized Odds

Statistical fairness metrics compare model behavior across demographic groups.

import numpy as np

def demographic_parity_difference(
    predictions_by_group, group_labels
):
    """Compute demographic parity: equal positive rate across groups."""
    group_rates = {}
    for group_id in np.unique(group_labels):
        mask = group_labels == group_id
        group_rates[group_id] = predictions_by_group[mask].mean()
    
    rates = list(group_rates.values())
    return max(rates) - min(rates)


def equalized_odds_difference(
    predictions_by_group, group_labels, true_labels
):
    """Equalized odds: equal true positive AND false positive rates."""
    differences = {}
    
    for metric_name, metric_func in [
        ('TPR', lambda p, t: (p[t==1].mean())),
        ('FPR', lambda p, t: (p[t==0].mean()))
    ]:
        group_rates = {}
        for group_id in np.unique(group_labels):
            mask = group_labels == group_id
            group_rates[group_id] = metric_func(
                predictions_by_group[mask], true_labels[mask]
            )
        rates = list(group_rates.values())
        differences[metric_name] = max(rates) - min(rates)
    
    return differences

Embedding Bias Measurement

Word embeddings encode societal biases present in training data. The classic WEAT test measures differential association of target word sets with attribute word sets.

def weat_effect_size(target_x, target_y, attribute_a, attribute_b, embeddings):
    """Compute WEAT effect size for embedding bias."""
    def mean_cos_sim(words, attribute_set):
        sims = []
        for w1 in words:
            for w2 in attribute_set:
                sims.append(cosine_similarity(embeddings[w1], embeddings[w2]))
        return np.mean(sims)
    
    def diff_means(words):
        return mean_cos_sim(words, attribute_a) - mean_cos_sim(words, attribute_b)
    
    total_diff = diff_means(target_x) - diff_means(target_y)
    std_diff = np.std([
        diff_means([w]) for w in list(target_x) + list(target_y)
    ])
    
    return total_diff / std_diff

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Implement a bias detection pipeline that screens model outputs for gendered occupational associations using both statistical tests and manual review of sampled outputs.

← Chapter 11
Activation Patching
Chapter 13 →
Fairness Metrics