10. NumPy Arrays for ML
Why NumPy
NumPy is the foundation of every major AI framework. TensorFlow and PyTorch tensors are NumPy arrays with GPU support. Understanding NumPy prepares you for everything else.
NumPy arrays store homogeneous data in contiguous memory. This enables vectorized operations that are 10-100x faster than Python loops.
Creating Arrays
import numpy as np
# From a list
arr = np.array([1, 2, 3, 4, 5])
# Ranges
arr = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
# Zeros, ones
zeros = np.zeros(5) # [0, 0, 0, 0, 0]
ones = np.ones((3, 4)) # 3x4 matrix of ones
# Random
random_arr = np.random.rand(5) # Uniform [0, 1)
random_normal = np.random.randn(100) # Standard normal
random_ints = np.random.randint(0, 10, 5) # 5 random ints [0, 10)
Array Attributes
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape) # (2, 3)
print(arr.ndim) # 2
print(arr.size) # 6
print(arr.dtype) # int64
Indexing
arr = np.array([10, 20, 30, 40, 50])
print(arr[0]) # 10
print(arr[-1]) # 50
print(arr[1:4]) # [20, 30, 40]
# 2D indexing
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix[0, 0]) # 1 (first row, first column)
print(matrix[1, :]) # [4, 5, 6] (entire second row)
print(matrix[:, 2]) # [3, 6, 9] (entire third column)
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Create a 2D NumPy array representing embeddings for 3 words, each with 4 dimensions. Print the shape, access the second row, and extract the last dimension from all rows.