Pandas DataFrames — Python for AI — Zero to Useful (Chapter 14)

Creating DataFrames

import pandas as pd

# From a dictionary
df = pd.DataFrame({
    "model": ["gpt-4", "gpt-3.5-turbo", "claude-3", "llama-2"],
    "context_window": [8192, 16385, 200000, 4096],
    "cost_per_1k": [0.03, 0.002, 0.015, 0.0]
})

print(df)

Basic Operations

print(df.head(2))           # First 2 rows
print(df.shape)             # (4, 3)
print(df.columns)          # Column names
print(df.dtypes)           # Data types

# Descriptive statistics
print(df.describe())

Selecting Columns

# Single column (returns Series)
models = df["model"]

# Multiple columns (returns DataFrame)
subset = df[["model", "cost_per_1k"]]

# Column operations
df["cost_per_1m"] = df["cost_per_1k"] * 1000

Selecting Rows

# By position
print(df.iloc[0])          # First row as Series

# By condition
expensive = df[df["cost_per_1k"] > 0.01]
print(expensive)

Adding and Removing Columns

# Add calculated column
df["affordable"] = df["cost_per_1k"] <= 0.01

# Remove column
df.drop("affordable", axis=1, inplace=True)

# Rename columns
df.rename(columns={"cost_per_1k": "cost"}, inplace=True)

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.