HOW-TO · RAG

How to Implement Cohere Reranking API

intermediate15 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Cohere API key, Python requests library

What this does

The Cohere Rerank API accepts a natural language query and a list of candidate document texts, then returns a ranked list with relevance scores. It provides a hosted alternative to running cross-encoder models locally, delivering high accuracy reranking without GPU infrastructure. The API call is straightforward: send your query and up to 1000 candidate documents per request and receive scored results in return.

Steps

  1. Set your API key as an environment variable: export COHERE_API_KEY="your-key-here" to keep it out of source code.
  2. Construct a JSON payload containing the query string, the list of candidate documents (each as a plain text string), and the model name (e.g., rerank-multilingual-v3.0).
  3. POST the payload to https://api.cohere.ai/v1/rerank with the header Authorization: Bearer <your-key> and Content-Type: application/json.
  4. Parse the JSON response, which contains a results array of objects each with index, relevance_score, and document fields.
  5. Sort your local candidate list according to the index ordering from the response (or use the relevance_score directly to mix with other signals).
  6. Return the top-K results from the reranked list to your LLM as the final context.
  7. Handle rate limit responses (HTTP 429) with exponential backoff and retry logic.

Verification

Send a test query with five candidate documents and verify the response includes a results array with relevance scores between 0 and 1. Confirm that the highest-scored document is semantically most relevant to the query.

Expected output: Response status 200. Results: [{"index": 2, "relevance_score": 0.971}, {"index": 0, "relevance_score": 0.842}, {"index": 4, "relevance_score": 0.719}]. Document at index 2 reordered to first position with score 0.971.

Common failures

  1. API key not authorized for rerank endpoint: Cohere requires an active API key with reranking permissions. New accounts may have free-tier limits that restrict the rerank endpoint. Check the Cohere dashboard for endpoint availability and quota status.
  2. Payload document count exceeds limit: The API supports up to 1000 documents per request. If your initial retrieval returns more candidates, paginate into batches of 1000 and merge the sorted results using Reciprocal Rank Fusion before returning the final ranked list.
  3. Network timeout on large requests: Sending 1000 documents with long text per document can exceed default HTTP timeouts. Increase the request timeout to 60 seconds or higher, and consider summarizing candidate documents before sending them to reduce payload size.

Related guides

  • setup-cross-encoder-reranking
  • evaluate-reranking-quality-ndcg