14. Pandas DataFrames
Creating DataFrames
import pandas as pd
# From a dictionary
df = pd.DataFrame({
"model": ["gpt-4", "gpt-3.5-turbo", "claude-3", "llama-2"],
"context_window": [8192, 16385, 200000, 4096],
"cost_per_1k": [0.03, 0.002, 0.015, 0.0]
})
print(df)
Basic Operations
print(df.head(2)) # First 2 rows
print(df.shape) # (4, 3)
print(df.columns) # Column names
print(df.dtypes) # Data types
# Descriptive statistics
print(df.describe())
Selecting Columns
# Single column (returns Series)
models = df["model"]
# Multiple columns (returns DataFrame)
subset = df[["model", "cost_per_1k"]]
# Column operations
df["cost_per_1m"] = df["cost_per_1k"] * 1000
Selecting Rows
# By position
print(df.iloc[0]) # First row as Series
# By condition
expensive = df[df["cost_per_1k"] > 0.01]
print(expensive)
Adding and Removing Columns
# Add calculated column
df["affordable"] = df["cost_per_1k"] <= 0.01
# Remove column
df.drop("affordable", axis=1, inplace=True)
# Rename columns
df.rename(columns={"cost_per_1k": "cost"}, inplace=True)
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Create a DataFrame of AI models with columns for name, provider, context window, and cost. Filter to models with context > 10K tokens and add a column showing cost per million tokens.