What this does

Citation support tracks the source of each fact the agent retrieves from the web, formatting them as numbered references in the response so users can verify the information.

Steps

Return structured results with URLs. Include source metadata in search results.

def search_with_sources(query: str, max_results: int = 5) -> list[dict]:
    """Search and return results with source tracking."""
    response = tavily_client.search(query=query, max_results=max_results)
    sources = []
    for i, r in enumerate(response["results"]):
        sources.append({
            "id": i + 1,
            "title": r["title"],
            "url": r["url"],
            "content": r["content"][:500]
        })
    return sources

Format the prompt to request citations. Instruct the LLM to use numbered references.

def build_citation_prompt(query: str, sources: list[dict]) -> str:
    context = "\n\n".join(
        f"[Source {s['id']}] {s['title']} ({s['url']}):\n{s['content']}"
        for s in sources
    )
    return f"""Answer the question using the sources below. Cite sources as [1], [2], etc.

Sources:
{context}

Question: {query}

Answer with citations:"""

Parse citations from the LLM response. Extract and validate references.

import re

def extract_citations(response: str) -> list[dict]:
    refs = re.findall(r'\[(\d+)\]', response)
    return [{"citation_number": int(r)} for r in sorted(set(refs))]

Build the tool with citation metadata. Expose a tool that returns both answer and sources.

@tool
def search_with_citations(query: str) -> str:
    """Search the web and return answer with citations."""
    sources = search_with_sources(query)
    prompt = build_citation_prompt(query, sources)

    response = llm.invoke(prompt)
    answer = response.content

    refs = extract_citations(answer)
    footer = "\n\n### References\n"
    for ref in refs:
        src = next(s for s in sources if s["id"] == ref["citation_number"])
        footer += f"[{src['id']}] {src['title']}: {src['url']}\n"

    return answer + footer

Render citations in a UI-friendly format. Return a structured dict.

def search_structured(query: str) -> dict:
    sources = search_with_sources(query)
    prompt = build_citation_prompt(query, sources)
    answer = llm.invoke(prompt).content

    return {
        "answer": answer,
        "citations": [
            {"number": s["id"], "title": s["title"], "url": s["url"]}
            for s in sources
            if f"[{s['id']}]" in answer
        ]
    }

Verify citation integrity. Ensure every citation has a matching source.

def validate_citations(answer: str, sources: list[dict]) -> list[str]:
    cited = set(int(r) for r in re.findall(r'\[(\d+)\]', answer))
    available = set(s["id"] for s in sources)
    missing = cited - available
    if missing:
        return [f"Citation [{m}] has no matching source" for m in missing]
    return []

Verification

python -c "
import re
text = 'According to [1] and [3], AI is evolving.'
refs = re.findall(r'\[(\d+)\]', text)
print(refs)
# Expected: ['1', '3']
"

Common failures

Citation number mismatch. The LLM cites [5] but only 3 sources were provided. Instruct the LLM to only cite sources that were given.
Hallucinated sources. The LLM includes citations not in the provided source list. Validate citations before displaying.
Broken URLs. Sources may contain dead links. Add an available check in the source metadata or catch 404s.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

How to Integrate Web Search Tool in Agents
How to Build RetrievalQA Chain with Sources