09. Hypothesis Testing
Hypothesis testing provides formal methods for making decisions about populations based on sample data. AI can guide test selection, explain results, and identify assumption violations that invalidate conclusions.
The hypothesis testing process involves stating hypotheses, selecting an appropriate test, calculating statistics, and interpreting results in context. Each step has common failure modes that AI guidance can help avoid.
Test Selection Guidance
Selecting the wrong test produces invalid results regardless of calculation accuracy. AI can recommend tests based on data characteristics and research questions.
import ollama
from scipy import stats
import pandas as pd
import numpy as np
def recommend_test(df: pd.DataFrame, scenario: str) -> dict:
"""Recommend hypothesis test based on scenario."""
# Analyze data characteristics
numeric_cols = df.select_dtypes(include=np.number).columns.tolist()
categorical_cols = df.select_dtypes(include='object').columns.tolist()
prompt = f"""Scenario: {scenario}
Available columns:
- Numeric: {numeric_cols}
- Categorical: {categorical_cols}
Recommend appropriate test(s) including:
1. Test name and purpose
2. Required data assumptions
3. Hypotheses being tested
4. Why this test fits the scenario
5. Alternative tests to consider"""
response = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': prompt}]
)
return response['message']['content']
# Example scenarios
df = pd.DataFrame({
'group': ['A'] * 50 + ['B'] * 50,
'value': np.concatenate([np.random.normal(100, 15, 50), np.random.normal(110, 15, 50)])
})
recommendation = recommend_test(df, "Compare mean values between two groups")
print(recommendation)
Common test recommendations include:
- t-test: Compare means of two groups when data is approximately normal
- Mann-Whitney U: Compare two groups without normality assumption
- ANOVA: Compare means across three or more groups
- Chi-square: Compare proportions or test independence
- Correlation test: Test association between variables
Implementing Tests with Interpretation
AI can guide test implementation and interpret results in context.
def conduct_test_with_interpretation(df, test_type, **params):
"""Conduct hypothesis test and get AI interpretation."""
# Conduct test based on type
if test_type == 'ttest':
from scipy.stats import ttest_ind
stat, pval = ttest_ind(params['group1'], params['group2'])
elif test_type == 'mannwhitney':
from scipy.stats import mannwhitneyu
stat, pval = mannwhitneyu(params['group1'], params['group2'])
elif test_type == 'chisquare':
from scipy.stats import chi2_contingency
stat, pval, dof, expected = chi2_contingency(params['contingency'])
else:
raise ValueError(f"Unknown test type: {test_type}")
# Get interpretation
prompt = f"""Test: {test_type}
Test statistic: {stat:.4f}
p-value: {pval:.6f}
Sample size: {params.get('n', 'not specified')}
Explain:
1. What the test statistic means
2. How to interpret the p-value
3. Practical significance of the result
4. What assumptions were made and if they were satisfied"""
interpretation = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': prompt}]
)['message']['content']
return {
'statistic': stat,
'p_value': pval,
'significant': pval < 0.05,
'interpretation': interpretation
}
# Example: comparing two groups
group_a = np.random.normal(100, 15, 50)
group_b = np.random.normal(110, 15, 50)
result = conduct_test_with_interpretation(
df,
'ttest',
group1=group_a,
group2=group_b,
n=100
)
print(f"p-value: {result['p_value']:.4f}")
print(f"Significant: {result['significant']}")
print(f"Interpretation: {result['interpretation']}")
Verifying Assumptions
Hypothesis tests assume specific data characteristics. Violated assumptions invalidate results. AI can guide assumption checking.
def verify_test_assumptions(data: np.ndarray, test_type: str) -> dict:
"""Check if assumptions are met for specified test."""
checks = {}
if test_type in ['ttest', 'anova']:
# Check normality
from scipy.stats import shapiro
if len(data) < 5000:
stat, pval = shapiro(data)
checks['normality'] = {
'passed': pval > 0.05,
'statistic': stat,
'p_value': pval
}
# Check variance equality
# (would need multiple groups for actual Levene's test)
if test_type == 'correlation':
# Check for linear relationship
# (requires scatter plot assessment)
checks['linearity'] = "Visual inspection recommended"
# Get AI feedback on assumption violations
prompt = f"""Assumption checks for {test_type}:
{checks}
What are the consequences if assumptions are violated?
What alternative tests should be used if assumptions fail?"""
feedback = ollama.chat(
model='llama3.2',
messages=[{'role': 'user', 'content': prompt}]
)['message']['content']
return {
'checks': checks,
'feedback': feedback
}
# Check assumptions before conducting test
assumptions = verify_test_assumptions(group_a, 'ttest')
print(f"Passed: {assumptions['checks']['normality']['passed']}")
print(f"Feedback: {assumptions['feedback']}")
Design three different hypothesis tests on a single dataset. Use AI recommendations for test selection and assumption checking. Interpret results in terms relevant to the analytical question being addressed.