13. Pandas Series
What Is Pandas
Pandas provides DataFrame and Series objects for structured data. Series is a single column; DataFrame is a table. If NumPy is for numerical arrays, Pandas is for labeled tabular data.
Creating Series
import pandas as pd
# From a list
temperatures = pd.Series([0.0, 0.3, 0.5, 0.7, 0.9, 1.0])
# With custom index
models = pd.Series(
["gpt-4", "gpt-3.5-turbo", "claude-3"],
index=["openai_1", "openai_2", "anthropic"]
)
# From a dictionary
costs = pd.Series({
"gpt-4": 0.03,
"gpt-3.5-turbo": 0.002,
"claude-3": 0.015
})
Series Operations
costs = pd.Series({"gpt-4": 0.03, "gpt-3.5-turbo": 0.002, "claude-3": 0.015})
print(costs * 1000) # Convert to per-1K cost
print(costs > 0.01) # Boolean mask
print(costs[costs > 0.01]) # Filter expensive models
Indexing
models = pd.Series(["gpt-4", "claude-3", "llama-2"], index=["a", "b", "c"])
print(models["a"]) # gpt-4
print(models.iloc[0]) # gpt-4 (positional)
print(models.loc["a"]) # gpt-4 (label-based)
Handling Missing Data
data = pd.Series([1, 2, None, 4, 5])
print(data.isna()) # [False, False, True, False, False]
print(data.fillna(0)) # Replace None with 0
print(data.dropna()) # Remove None entries
print(data.mean()) # Ignores None: 3.0
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Create a Series of model response times in milliseconds. Calculate the average, find the fastest model, and replace any missing values with the median.