Pandas Series — Python for AI — Zero to Useful (Chapter 13)

What Is Pandas

Pandas provides DataFrame and Series objects for structured data. Series is a single column; DataFrame is a table. If NumPy is for numerical arrays, Pandas is for labeled tabular data.

Creating Series

import pandas as pd

# From a list
temperatures = pd.Series([0.0, 0.3, 0.5, 0.7, 0.9, 1.0])

# With custom index
models = pd.Series(
    ["gpt-4", "gpt-3.5-turbo", "claude-3"],
    index=["openai_1", "openai_2", "anthropic"]
)

# From a dictionary
costs = pd.Series({
    "gpt-4": 0.03,
    "gpt-3.5-turbo": 0.002,
    "claude-3": 0.015
})

Series Operations

costs = pd.Series({"gpt-4": 0.03, "gpt-3.5-turbo": 0.002, "claude-3": 0.015})

print(costs * 1000)      # Convert to per-1K cost
print(costs > 0.01)      # Boolean mask
print(costs[costs > 0.01])  # Filter expensive models

Indexing

models = pd.Series(["gpt-4", "claude-3", "llama-2"], index=["a", "b", "c"])

print(models["a"])           # gpt-4
print(models.iloc[0])        # gpt-4 (positional)
print(models.loc["a"])        # gpt-4 (label-based)

Handling Missing Data

data = pd.Series([1, 2, None, 4, 5])
print(data.isna())          # [False, False, True, False, False]
print(data.fillna(0))       # Replace None with 0
print(data.dropna())        # Remove None entries
print(data.mean())          # Ignores None: 3.0

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.