Advanced Summarization — Advanced NLP with Local Models (Chapter 8)

Abstractive summarization generates novel sentences capturing document essentials, unlike extractive methods that copy verbatim phrases. Modern local LLMs produce abstractive summaries by understanding document semantics and reconstructing information with varied expression.

Prompt engineering for summarization involves balancing specificity against generalization. Overly specific prompts produce summaries missing broader context; too-general prompts generate summaries losing critical details. Iterative refinement with evaluation feedback identifies optimal instruction phrasing for domain-specific applications.

import ollama

def abstractive_summarize(document, model="llama3", max_tokens=200):
    prompt = f"""Read the following document and produce a concise abstractive summary.
    The summary should:
    - Capture main ideas and key findings
    - Use original phrasing where accurate
    - Exclude peripheral details
    - Maintain factual accuracy without fabricating details
    
    Document:
    {document}
    
    Summary:"""
    
    response = ollama.generate(
        model=model,
        prompt=prompt,
        options={
            'temperature': 0.3,
            'num_predict': max_tokens
        }
    )
    return response['response']

def extractive_summarize(document, model="llama3", sentence_count=5):
    prompt = f"""Extract the {sentence_count} most important sentences from the document.
    Preserve original wording exactly.
    
    Document:
    {document}
    
    Important sentences:"""
    
    response = ollama.generate(model=model, prompt=prompt)
    return response['response']

# Hybrid approach
def hybrid_summary(document, model="llama3"):
    extract = extractive_summarize(document, model, sentence_count=3)
    abstract = abstractive_summarize(
        f"Based on these key points: {extract}\n\nFull document context: {document}",
        model
    )
    return {
        'key_sentences': extract,
        'abstract': abstract
    }

Controllability in summarization addresses user requirements beyond general abstraction. Length constraints, topic emphasis, and tone adjustments require additional control mechanisms. Instruction weight tuning through system prompts enables real-time specification of summary characteristics without model retraining.

Hallucination monitoring remains critical for abstractive summarization. Models occasionally generate plausible-sounding statements not supported by source documents. Faithfulness evaluation through entailment checking against source text identifies fabricated content before summary distribution.

Streaming summaries process long documents incrementally. Rather than requiring full document context, streaming approaches segment documents into sections processed sequentially, with the model maintaining summary state across segments. This approach reduces peak memory requirements for very long documents.