02. Named Entity Recognition

Chapter 2 of 18 · 15 min

Named Entity Recognition identifies and classifies specific elements within text—organization names, dates, monetary values, and geographic locations. Local LLMs approach NER as a text generation task where structured output follows input-context specifications.

Prompting strategies for NER typically take the form of instruction-completion pairs. A well-structured prompt specifies entity types, output format preferences, and handling rules for ambiguous cases. Entity types might include PERSON, ORG, DATE, GPE, and FACILITY, though specific schemas vary by application domain.

import ollama

def extract_entities(text, model="llama3"):
    prompt = f"""Extract all named entities from the text below.
    Return entities as a structured list with type and value.
    
    Entity types: PERSON, ORG, GPE, DATE, MONEY, FACILITY
    
    Text: {text}
    
    Entities:"""
    
    response = ollama.generate(
        model=model,
        prompt=prompt,
        options={'temperature': 0.1}
    )
    return response['response']

text = "Apple Inc. announced on January 15, 2024 that its Seattle office will expand to 500 employees."
entities = extract_entities(text)
print(entities)

Handling complex cases requires explicit instructions. Nested entities—where one entity contains another—appear frequently in news articles: "The White House announced policy changes." Here, "The White House" is an ORG while "House" might be classified as a building (FACILITY) in other contexts. Resolution strategies must be specified in prompt templates.

Overlapping entity types present additional challenges. The phrase "French technology companies" contains both a nationality (GPE-derived) and an organization category. Prompt engineering must specify priority rules for type assignment.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Design a custom entity schema for legal contract analysis. Implement extraction with handling for nested and overlapping entities. Evaluate precision and recall on a annotated test set.