How to Implement Cohere Reranking API
Cohere API key, Python requests library
What this does
The Cohere Rerank API accepts a natural language query and a list of candidate document texts, then returns a ranked list with relevance scores. It provides a hosted alternative to running cross-encoder models locally, delivering high accuracy reranking without GPU infrastructure. The API call is straightforward: send your query and up to 1000 candidate documents per request and receive scored results in return.
Steps
- Set your API key as an environment variable:
export COHERE_API_KEY="your-key-here"to keep it out of source code. - Construct a JSON payload containing the query string, the list of candidate documents (each as a plain text string), and the model name (e.g.,
rerank-multilingual-v3.0). - POST the payload to
https://api.cohere.ai/v1/rerankwith the headerAuthorization: Bearer <your-key>andContent-Type: application/json. - Parse the JSON response, which contains a
resultsarray of objects each withindex,relevance_score, anddocumentfields. - Sort your local candidate list according to the
indexordering from the response (or use therelevance_scoredirectly to mix with other signals). - Return the top-K results from the reranked list to your LLM as the final context.
- Handle rate limit responses (HTTP 429) with exponential backoff and retry logic.
Verification
Send a test query with five candidate documents and verify the response includes a results array with relevance scores between 0 and 1. Confirm that the highest-scored document is semantically most relevant to the query.
Expected output: Response status 200. Results: [{"index": 2, "relevance_score": 0.971}, {"index": 0, "relevance_score": 0.842}, {"index": 4, "relevance_score": 0.719}]. Document at index 2 reordered to first position with score 0.971.
Common failures
- API key not authorized for rerank endpoint: Cohere requires an active API key with reranking permissions. New accounts may have free-tier limits that restrict the rerank endpoint. Check the Cohere dashboard for endpoint availability and quota status.
- Payload document count exceeds limit: The API supports up to 1000 documents per request. If your initial retrieval returns more candidates, paginate into batches of 1000 and merge the sorted results using Reciprocal Rank Fusion before returning the final ranked list.
- Network timeout on large requests: Sending 1000 documents with long text per document can exceed default HTTP timeouts. Increase the request timeout to 60 seconds or higher, and consider summarizing candidate documents before sending them to reduce payload size.
Related guides
- setup-cross-encoder-reranking
- evaluate-reranking-quality-ndcg