RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Build Web Search with Citation Support
HOW-TO · RAG

How to Build Web Search with Citation Support

advanced·30 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Web search tool working, LLM for citation formatting, Python 3.10+

What this does

Citation support tracks the source of each fact the agent retrieves from the web, formatting them as numbered references in the response so users can verify the information.

Steps

  • Return structured results with URLs. Include source metadata in search results.
def search_with_sources(query: str, max_results: int = 5) -> list[dict]:
    """Search and return results with source tracking."""
    response = tavily_client.search(query=query, max_results=max_results)
    sources = []
    for i, r in enumerate(response["results"]):
        sources.append({
            "id": i + 1,
            "title": r["title"],
            "url": r["url"],
            "content": r["content"][:500]
        })
    return sources
  • Format the prompt to request citations. Instruct the LLM to use numbered references.
def build_citation_prompt(query: str, sources: list[dict]) -> str:
    context = "\n\n".join(
        f"[Source {s['id']}] {s['title']} ({s['url']}):\n{s['content']}"
        for s in sources
    )
    return f"""Answer the question using the sources below. Cite sources as [1], [2], etc.

Sources:
{context}

Question: {query}

Answer with citations:"""
  • Parse citations from the LLM response. Extract and validate references.
import re

def extract_citations(response: str) -> list[dict]:
    refs = re.findall(r'\[(\d+)\]', response)
    return [{"citation_number": int(r)} for r in sorted(set(refs))]
  • Build the tool with citation metadata. Expose a tool that returns both answer and sources.
@tool
def search_with_citations(query: str) -> str:
    """Search the web and return answer with citations."""
    sources = search_with_sources(query)
    prompt = build_citation_prompt(query, sources)

    response = llm.invoke(prompt)
    answer = response.content

    refs = extract_citations(answer)
    footer = "\n\n### References\n"
    for ref in refs:
        src = next(s for s in sources if s["id"] == ref["citation_number"])
        footer += f"[{src['id']}] {src['title']}: {src['url']}\n"

    return answer + footer
  • Render citations in a UI-friendly format. Return a structured dict.
def search_structured(query: str) -> dict:
    sources = search_with_sources(query)
    prompt = build_citation_prompt(query, sources)
    answer = llm.invoke(prompt).content

    return {
        "answer": answer,
        "citations": [
            {"number": s["id"], "title": s["title"], "url": s["url"]}
            for s in sources
            if f"[{s['id']}]" in answer
        ]
    }
  • Verify citation integrity. Ensure every citation has a matching source.
def validate_citations(answer: str, sources: list[dict]) -> list[str]:
    cited = set(int(r) for r in re.findall(r'\[(\d+)\]', answer))
    available = set(s["id"] for s in sources)
    missing = cited - available
    if missing:
        return [f"Citation [{m}] has no matching source" for m in missing]
    return []

Verification

python -c "
import re
text = 'According to [1] and [3], AI is evolving.'
refs = re.findall(r'\[(\d+)\]', text)
print(refs)
# Expected: ['1', '3']
"

Common failures

  • Citation number mismatch. The LLM cites [5] but only 3 sources were provided. Instruct the LLM to only cite sources that were given.
  • Hallucinated sources. The LLM includes citations not in the provided source list. Validate citations before displaying.
  • Broken URLs. Sources may contain dead links. Add an available check in the source metadata or catch 404s.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Integrate Web Search Tool in Agents
  • How to Build RetrievalQA Chain with Sources
← All how-to guidesCourses →