RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Data Analysis with Local AI
  6. /Ch. 10
Data Analysis with Local AI

10. Correlation Analysis

Chapter 10 of 18 · 20 min
KEY INSIGHT

Spearman correlation and Cramér's V handle non-normal distributions and categorical data where Pearson fails. Always lag-test correlations before claiming causation.

Correlation analysis quantifies the strength and direction of relationships between variables. Understanding correlations prevents spurious assumptions and reveals hidden patterns in data.

Pearson vs Spearman Correlation

Pearson measures linear relationships and assumes normal distribution. Spearman measures monotonic relationships using rank order, handling non-linear patterns and outliers better.

import pandas as pd
import numpy as np

df = pd.read_csv('sales_data.csv')

# Pearson correlation matrix
pearson_corr = df[['revenue', 'marketing_spend', 'customer_count']].corr(method='pearson')
print("Pearson Correlation:")
print(pearson_corr)

# Spearman for non-linear relationships
spearman_corr = df[['revenue', 'marketing_spend', 'customer_count']].corr(method='spearman')
print("\nSpearman Correlation:")
print(spearman_corr)

Visualizing Correlations

Heatmaps reveal correlation structures at a glance.

import matplotlib.pyplot as plt
import seaborn as sns

# Full correlation matrix for all numeric columns
numeric_df = df.select_dtypes(include=[np.number])
corr_matrix = numeric_df.corr()

plt.figure(figsize=(12, 8))
sns.heatmap(corr_matrix, annot=True, cmap='RdBu_r', center=0, fmt='.2f')
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.savefig('correlation_heatmap.png', dpi=150)

Categorical Variable Correlations

Use Cramér's V for categorical-categorical relationships:

from scipy.stats import chi2_contingency

def cramers_v(confusion_matrix):
    chi2 = chi2_contingency(confusion_matrix)[0]
    n = confusion_matrix.sum().sum()
    phi2 = chi2 / n
    r, k = confusion_matrix.shape
    return np.sqrt(phi2 / min(k-1, r-1))

# Example: correlation between product category and customer segment
confusion = pd.crosstab(df['product_category'], df['customer_segment'])
v = cramers_v(confusion)
print(f"Cramér's V: {v:.3f}")

Correlation vs Causation Trap

Strong correlation never implies causation. Use lagged correlation analysis to explore temporal precedence:

# Check if marketing spend leads to revenue change
shifted_marketing = df['marketing_spend'].shift(7)  # 7-day lag
df['lagged_correlation'] = df['revenue'].corr(shifted_marketing)
print(f"Lagged correlation (7 days): {df['lagged_correlation']:.3f}")

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Calculate correlation matrix for your dataset, filter for absolute correlation > 0.5, then visualize as a heatmap with annotations showing only high-correlation pairs.

← Chapter 9
Hypothesis Testing
Chapter 11 →
Time Series Analysis