---
title: "Decision: Anthropic Haiku for hub-recall reranker"
url: https://mdfy.app/sB3a5eOG
updated: 2026-05-14T18:15:49.480Z
source: "mdfy.app"
---
# Decision: Anthropic Haiku for hub-recall reranker

> Logged 2026-04-11.

## What we ship

Hybrid retrieval (BM25 + pgvector union, top-30) → Anthropic Haiku 4.5 reranks → top-k returned to caller.

## Why Haiku, not Voyage rerank-2 / Cohere rerank-v3 / Mixedbread

Voyage rerank-2 is the obvious technical choice (it's *the* reranker model from a *rerankers-only* shop). I ran the eval anyway:

| Reranker | nDCG@5 (our eval set) | p95 latency | $/1M tokens |
|---|---:|---:|---:|
| Haiku 4.5 | 0.83 | 320ms | $1 in / $5 out |
| Voyage rerank-2 | 0.85 | 110ms | $0.50 / 1M |
| Cohere rerank-v3 | 0.84 | 180ms | $1.00 / 1M |

Voyage and Cohere are slightly more accurate and faster. So why Haiku?

- **Single-vendor story.** We already use Anthropic for capture summarisation and graph extraction. Adding a second LLM provider for *just* reranking is operationally heavier than the marginal quality gain.
- **The eval gap is inside noise.** Our eval set has 60 queries. The 0.02 nDCG gap between Haiku and Voyage falls inside the bootstrap CI. We can't prove the difference is real at this scale.
- **Latency budget has room.** Our recall budget is 800ms p95 end-to-end. The reranker is 320ms of that. We're not against a wall.

## When I'd revisit

- If Voyage releases rerank-2.5 with a meaningful jump on the long-doc benchmark. Voyage has explicitly said they're working on it.
- If we ever need to serve recall to a free-tier user at scale. Voyage's $0.50/1M would compound enough to matter.

Until then, single-vendor wins.