Feed-Forward Network
Definition pending
We've cataloged "Feed-Forward Network" but haven't written a full definition yet. Definitions are hand-curated rather than auto-generated, so it takes time to cover every term.
Want this one prioritized? Email us and we'll bump it.
Practical example
FFN (Feed-Forward Network) layers are the "memory" of a transformer. After attention figures out which tokens are relevant, the FFN processes each token independently through two linear layers with an activation (usually SwiGLU in modern models). FFN layers store factual knowledge — research shows facts are primarily encoded in FFN weights, not attention.
Workflow example
When a model confidently asserts a wrong fact, it's likely an FFN issue — the factual association is wrong. Debug: (1) identify the token where the error was generated, (2) if attention weights to relevant context look correct but output is wrong, FFN is the culprit, (3) fix options: fine-tune FFN layers specifically, or use RAG to override with retrieved facts. Don't waste time tweaking attention if FFN is where the knowledge lives.