Feature Engineering

Feature engineering is the process of transforming raw data into input variables (features) that improve model performance. In local AI, this often means preparing text, images, or structured data before feeding it to a model. For LLMs, feature engineering can involve crafting prompts, tokenization strategies, or embedding selection. Operators encounter it when deciding how to chunk documents for RAG, normalize numerical columns for tabular models, or design prompt templates that guide model output. The goal is to make patterns more accessible to the model, reducing the need for the model to learn irrelevant noise.

Feature engineering is a critical step in the machine learning pipeline, especially when working with smaller local models that lack the capacity to learn from raw data alone. For tabular data, it includes creating interaction terms, binning continuous variables, and encoding categorical variables. For text data, it involves tokenization (e.g., byte-pair encoding), stop-word removal, and n-gram generation. In the context of local LLMs, feature engineering often manifests as prompt engineering—structuring input to elicit desired responses. Operators running models like Llama 3.1 8B on a 16GB GPU might find that carefully engineered prompts (e.g., including few-shot examples or explicit instructions) yield better results than raw queries. Similarly, for RAG workflows, chunking strategy (size, overlap) and embedding model choice are forms of feature engineering that directly impact retrieval quality. While deep learning can automate some feature extraction, local models benefit from human-guided feature engineering to compensate for limited parameters and VRAM.

An operator building a RAG system with Llama 3.1 8B on an RTX 4090 (24GB VRAM) might engineer features by chunking a 100-page PDF into 512-token segments with 128-token overlap. They then embed each chunk using a small embedding model (e.g., all-MiniLM-L6-v2) and store vectors in ChromaDB. The feature engineering choice—chunk size and overlap—directly affects retrieval accuracy and VRAM usage: smaller chunks increase retrieval granularity but require more embeddings and memory.

In a typical RAG workflow using Ollama and LangChain, feature engineering occurs when configuring the text splitter. For example, running ollama pull llama3.1:8b and then using RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) in Python. The operator must decide these parameters based on document structure and model context window. If chunks are too large, the model may miss relevant details; if too small, context coherence suffers. This decision is a direct feature engineering step that impacts retrieval quality and inference speed.

Reviewed by Fredoline Eruo. See our editorial policy.

When it doesn't work

Deeper dive

Practical example

Workflow example