09. Hypothesis Testing

Chapter 9 of 18 · 25 min

Hypothesis testing provides formal methods for making decisions about populations based on sample data. AI can guide test selection, explain results, and identify assumption violations that invalidate conclusions.

The hypothesis testing process involves stating hypotheses, selecting an appropriate test, calculating statistics, and interpreting results in context. Each step has common failure modes that AI guidance can help avoid.

Test Selection Guidance

Selecting the wrong test produces invalid results regardless of calculation accuracy. AI can recommend tests based on data characteristics and research questions.

import ollama
from scipy import stats
import pandas as pd
import numpy as np

def recommend_test(df: pd.DataFrame, scenario: str) -> dict:
    """Recommend hypothesis test based on scenario."""
    
    # Analyze data characteristics
    numeric_cols = df.select_dtypes(include=np.number).columns.tolist()
    categorical_cols = df.select_dtypes(include='object').columns.tolist()
    
    prompt = f"""Scenario: {scenario}
    
    Available columns:
    - Numeric: {numeric_cols}
    - Categorical: {categorical_cols}
    
    Recommend appropriate test(s) including:
    1. Test name and purpose
    2. Required data assumptions
    3. Hypotheses being tested
    4. Why this test fits the scenario
    5. Alternative tests to consider"""
    
    response = ollama.chat(
        model='llama3.2',
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response['message']['content']

# Example scenarios
df = pd.DataFrame({
    'group': ['A'] * 50 + ['B'] * 50,
    'value': np.concatenate([np.random.normal(100, 15, 50), np.random.normal(110, 15, 50)])
})

recommendation = recommend_test(df, "Compare mean values between two groups")
print(recommendation)

Common test recommendations include:

  • t-test: Compare means of two groups when data is approximately normal
  • Mann-Whitney U: Compare two groups without normality assumption
  • ANOVA: Compare means across three or more groups
  • Chi-square: Compare proportions or test independence
  • Correlation test: Test association between variables

Implementing Tests with Interpretation

AI can guide test implementation and interpret results in context.

def conduct_test_with_interpretation(df, test_type, **params):
    """Conduct hypothesis test and get AI interpretation."""
    
    # Conduct test based on type
    if test_type == 'ttest':
        from scipy.stats import ttest_ind
        stat, pval = ttest_ind(params['group1'], params['group2'])
    elif test_type == 'mannwhitney':
        from scipy.stats import mannwhitneyu
        stat, pval = mannwhitneyu(params['group1'], params['group2'])
    elif test_type == 'chisquare':
        from scipy.stats import chi2_contingency
        stat, pval, dof, expected = chi2_contingency(params['contingency'])
    else:
        raise ValueError(f"Unknown test type: {test_type}")
    
    # Get interpretation
    prompt = f"""Test: {test_type}
    Test statistic: {stat:.4f}
    p-value: {pval:.6f}
    Sample size: {params.get('n', 'not specified')}
    
    Explain:
    1. What the test statistic means
    2. How to interpret the p-value
    3. Practical significance of the result
    4. What assumptions were made and if they were satisfied"""
    
    interpretation = ollama.chat(
        model='llama3.2',
        messages=[{'role': 'user', 'content': prompt}]
    )['message']['content']
    
    return {
        'statistic': stat,
        'p_value': pval,
        'significant': pval < 0.05,
        'interpretation': interpretation
    }

# Example: comparing two groups
group_a = np.random.normal(100, 15, 50)
group_b = np.random.normal(110, 15, 50)

result = conduct_test_with_interpretation(
    df, 
    'ttest',
    group1=group_a,
    group2=group_b,
    n=100
)
print(f"p-value: {result['p_value']:.4f}")
print(f"Significant: {result['significant']}")
print(f"Interpretation: {result['interpretation']}")

Verifying Assumptions

Hypothesis tests assume specific data characteristics. Violated assumptions invalidate results. AI can guide assumption checking.

def verify_test_assumptions(data: np.ndarray, test_type: str) -> dict:
    """Check if assumptions are met for specified test."""
    
    checks = {}
    
    if test_type in ['ttest', 'anova']:
        # Check normality
        from scipy.stats import shapiro
        if len(data) < 5000:
            stat, pval = shapiro(data)
            checks['normality'] = {
                'passed': pval > 0.05,
                'statistic': stat,
                'p_value': pval
            }
        
        # Check variance equality
        # (would need multiple groups for actual Levene's test)
        
    if test_type == 'correlation':
        # Check for linear relationship
        # (requires scatter plot assessment)
        checks['linearity'] = "Visual inspection recommended"
    
    # Get AI feedback on assumption violations
    prompt = f"""Assumption checks for {test_type}:
    {checks}
    
    What are the consequences if assumptions are violated?
    What alternative tests should be used if assumptions fail?"""
    
    feedback = ollama.chat(
        model='llama3.2',
        messages=[{'role': 'user', 'content': prompt}]
    )['message']['content']
    
    return {
        'checks': checks,
        'feedback': feedback
    }

# Check assumptions before conducting test
assumptions = verify_test_assumptions(group_a, 'ttest')
print(f"Passed: {assumptions['checks']['normality']['passed']}")
print(f"Feedback: {assumptions['feedback']}")
EXERCISE

Design three different hypothesis tests on a single dataset. Use AI recommendations for test selection and assumption checking. Interpret results in terms relevant to the analytical question being addressed.