HOW-TO · RAG
How to Build Web Search with Citation Support
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES
Web search tool working, LLM for citation formatting, Python 3.10+
What this does
Citation support tracks the source of each fact the agent retrieves from the web, formatting them as numbered references in the response so users can verify the information.
Steps
- Return structured results with URLs. Include source metadata in search results.
def search_with_sources(query: str, max_results: int = 5) -> list[dict]:
"""Search and return results with source tracking."""
response = tavily_client.search(query=query, max_results=max_results)
sources = []
for i, r in enumerate(response["results"]):
sources.append({
"id": i + 1,
"title": r["title"],
"url": r["url"],
"content": r["content"][:500]
})
return sources
- Format the prompt to request citations. Instruct the LLM to use numbered references.
def build_citation_prompt(query: str, sources: list[dict]) -> str:
context = "\n\n".join(
f"[Source {s['id']}] {s['title']} ({s['url']}):\n{s['content']}"
for s in sources
)
return f"""Answer the question using the sources below. Cite sources as [1], [2], etc.
Sources:
{context}
Question: {query}
Answer with citations:"""
- Parse citations from the LLM response. Extract and validate references.
import re
def extract_citations(response: str) -> list[dict]:
refs = re.findall(r'\[(\d+)\]', response)
return [{"citation_number": int(r)} for r in sorted(set(refs))]
- Build the tool with citation metadata. Expose a tool that returns both answer and sources.
@tool
def search_with_citations(query: str) -> str:
"""Search the web and return answer with citations."""
sources = search_with_sources(query)
prompt = build_citation_prompt(query, sources)
response = llm.invoke(prompt)
answer = response.content
refs = extract_citations(answer)
footer = "\n\n### References\n"
for ref in refs:
src = next(s for s in sources if s["id"] == ref["citation_number"])
footer += f"[{src['id']}] {src['title']}: {src['url']}\n"
return answer + footer
- Render citations in a UI-friendly format. Return a structured dict.
def search_structured(query: str) -> dict:
sources = search_with_sources(query)
prompt = build_citation_prompt(query, sources)
answer = llm.invoke(prompt).content
return {
"answer": answer,
"citations": [
{"number": s["id"], "title": s["title"], "url": s["url"]}
for s in sources
if f"[{s['id']}]" in answer
]
}
- Verify citation integrity. Ensure every citation has a matching source.
def validate_citations(answer: str, sources: list[dict]) -> list[str]:
cited = set(int(r) for r in re.findall(r'\[(\d+)\]', answer))
available = set(s["id"] for s in sources)
missing = cited - available
if missing:
return [f"Citation [{m}] has no matching source" for m in missing]
return []
Verification
python -c "
import re
text = 'According to [1] and [3], AI is evolving.'
refs = re.findall(r'\[(\d+)\]', text)
print(refs)
# Expected: ['1', '3']
"
Common failures
- Citation number mismatch. The LLM cites [5] but only 3 sources were provided. Instruct the LLM to only cite sources that were given.
- Hallucinated sources. The LLM includes citations not in the provided source list. Validate citations before displaying.
- Broken URLs. Sources may contain dead links. Add an
availablecheck in the source metadata or catch 404s. - Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.
Related guides
- How to Integrate Web Search Tool in Agents
- How to Build RetrievalQA Chain with Sources