Automated Visualization — Data Analysis with Local AI (Chapter 6)

Selecting appropriate visualizations for data is a common challenge. Data has characteristics that suggest certain charts while making others inappropriate. Automated visualization uses AI to match data to charts based on established principles.

The selection process involves several factors: data type (categorical, continuous, temporal), variable count, analytical goal (comparison, distribution, relationship, composition), and audience expertise. AI can evaluate these factors and recommend charts that effectively communicate the intended message.

Chart Type Selection

Different chart types serve different purposes. Understanding when each applies helps evaluate AI recommendations and construct manual alternatives when needed.

Bar charts compare quantities across categories. They work well when category labels are short and meaningful. Stacked bars show composition; grouped bars enable comparison across multiple dimensions.

Line charts display trends over time or ordered sequences. They emphasize continuity and change rate. Multiple lines compare parallel trends.

Scatter plots reveal relationships between two continuous variables. They expose correlations, clusters, and outliers. Color or size encoding adds dimensions.

Histograms show distributions of single variables. They reveal shape (normal, skewed, bimodal), central tendency, and spread.

Box plots summarize distributions with quartiles, showing median, quartiles, and potential outliers. They enable comparison across categories.

Pie charts show composition as proportions of a whole. They work poorly when segments are similar in size or when more than 5-6 categories exist.

import ollama
import pandas as pd

def recommend_visualization(df: pd.DataFrame, goal: str) -> dict:
    """Recommend visualization based on data and goal."""
    
    # Analyze data characteristics
    numeric_cols = df.select_dtypes(include='number').columns.tolist()
    categorical_cols = df.select_dtypes(include='object').columns.tolist()
    date_cols = df.select_dtypes(include='datetime').columns.tolist()
    
    context = f"""Data characteristics:
    - Rows: {len(df)}
    - Numeric columns: {numeric_cols}
    - Categorical columns: {categorical_cols}
    - Date columns: {date_cols}
    
    Analysis goal: {goal}
    
    Recommend the best chart type and specific implementation details."""
    
    response = ollama.chat(
        model='llama3.2',
        messages=[{'role': 'user', 'content': context}]
    )
    
    return response['message']['content']

# Example usage
df = pd.DataFrame({
    'month': pd.date_range('2024-01', periods=12, freq='ME'),
    'revenue': [100, 120, 115, 130, 140, 155, 150, 165, 180, 175, 190, 210],
    'region': ['North'] * 6 + ['South'] * 6
})

recommendation = recommend_visualization(df, "Show revenue trend over time")
print(recommendation)

Generating Chart Code

AI can generate matplotlib or seaborn code to produce recommended visualizations. This combines selection with implementation.

def generate_chart_code(
    df: pd.DataFrame, 
    chart_type: str, 
    x: str, 
    y: str = None,
    **kwargs
) -> str:
    """Generate code for specified chart type."""
    
    prompt = f"""Generate Python code using matplotlib/seaborn to create a {chart_type}.
    
    Data: {df.head(3).to_dict()}
    X axis: {x}
    Y axis: {y}
    Additional options: {kwargs}
    
    Return ONLY the Python code, no markdown or explanations. Code should be complete and runnable."""
    
    response = ollama.chat(
        model='llama3.2',
        messages=[{'role': 'user', 'content': prompt}]
    )
    
    return response['message']['content']

# Generate and execute chart code
chart_code = generate_chart_code(
    df, 
    "line chart",
    x="month",
    y="revenue"
)
print(chart_code)

Executing the generated code produces the visualization. Errors in generation require debugging and regeneration with corrected prompts.

Common Visualization Failures

AI-generated visualizations sometimes fail in predictable ways. Recognizing these patterns enables quick correction.

Inappropriate axis scaling hides important variation or creates false impressions. A line chart with a y-axis starting at 99 instead of 0 exaggerates minor changes. Always check axis ranges against the actual data range.

Too many categories in pie charts creates unreadable segments. Pie charts with more than 5-6 categories should be converted to bar charts. AI sometimes suggests pie charts regardless of category count.

Missing legends or labels leaves viewers unable to interpret charts. Generated code sometimes skips essential labeling. Verify all necessary elements appear before sharing.

Inverted color schemes (red for positive, green for negative) confuse audiences expecting conventional mappings. Specify color schemes explicitly when generating code.

def validate_visualization(figure) -> list:
    """Check visualization for common issues."""
    
    issues = []
    
    for ax in figure.axes:
        # Check axis labels
        if not ax.get_xlabel():
            issues.append("Missing x-axis label")
        if not ax.get_ylabel():
            issues.append("Missing y-axis label")
        
        # Check legend
        if ax.legend_ is None and len(ax.lines) > 1:
            issues.append("Multiple lines without legend")
        
        # Check axis range
        ylim = ax.get_ylim()
        yrange = ylim[1] - ylim[0]
        ydata_range = max(ax.dataLim.bounds[3] for ax in figure.axes)
        if yrange / ydata_range > 100:
            issues.append("Y-axis scale may exaggerate changes")
    
    return issues