20. Inference with Fine-Tuned Models
Chapter 20 of 24 · 20 min
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
EXERCISE
: Implement Streaming Generation
def stream_generate(engine, prompt, max_tokens=256):
"""Yield tokens as they are generated."""
for token in engine.model.generate(prompt, max_tokens=max_tokens):
yield token
# Usage
for token in stream_generate(engine, "Write a story"):
print(token, end="", flush=True)