<- Back to Software Development

RAG Reranking

June 6, 20268 min read
Share

Reranking is the stage after retrieval and before final LLM context selection. Retrieval finds candidate chunks. Reranking reorders those candidates so the most useful evidence moves closer to the top. It does not replace retrieval. It fixes one common retrieval problem: the right chunk may be retrieved, but not ranked high enough to be used.

Short Answer

Reranking improves the order of retrieved chunks.

A basic RAG retrieval flow may look like this:

User question
→ Retrieval
→ Top 5 chunks
→ LLM answer

A reranking-based flow usually looks like this:

User question
→ Retrieval
→ Top 30 or Top 50 candidate chunks
→ Reranking
→ Best 5 chunks
→ LLM answer

Retrieval is usually optimized for fast candidate search. Reranking is usually optimized for more careful relevance judgment.

StageMain Job
RetrievalQuickly find possible evidence
RerankingReorder candidates by usefulness
Context selectionKeep only the best chunks for the LLM
LLM generationConvert selected evidence into an answer

Reranking is useful when retrieval returns the correct chunk somewhere in the candidate list, but the correct chunk is not near the top.

Why Reranking Exists

Retrieval is usually a broad search step.

Dense retrieval, keyword retrieval, and hybrid retrieval are designed to find candidates quickly. But fast retrieval scores are not always precise enough for final answer selection.

For example, the user asks:

Can I cancel my monthly subscription and get a refund for this month?

Retrieval may return these chunks:

RankRetrieved ChunkProblem
1General Refund RuleRelated to refund, but not subscription-specific
2Support ContactMentions billing support, but not the answer
3Subscription CancellationCorrect answer
4Digital Course ActivationRelated to refund, but different case
5Enterprise CustomersDifferent customer type

The correct chunk exists at rank 3. If the system only sends top 2 chunks to the LLM, the answer may be wrong or incomplete.

Reranking tries to move Subscription Cancellation above weaker candidates.

Retrieval vs Reranking

Retrieval and reranking are related, but they solve different problems.

ComparisonRetrievalReranking
Search spaceWhole index or filtered indexRetrieved candidate list
Main goalRecall possible evidenceImprove ordering precision
Typical inputQuery and indexed chunksQuery and candidate chunks
Typical outputTop-k candidatesReordered top-k candidates
CostLower per candidateHigher per candidate
SpeedFasterSlower
Failure modeCorrect chunk not foundCorrect chunk found but ranked poorly

Retrieval should maximize the chance that the correct evidence enters the candidate pool.

Reranking should maximize the chance that the best evidence appears near the top of the final context.

The two stages should be evaluated separately. If retrieval never returns the correct chunk, reranking cannot fix the system.

Common Reranking Methods

Reranking can be implemented in several ways.

Cross-Encoder Reranking

A model reads the query and each candidate chunk together, then scores relevance. This is usually more accurate than comparing separate embeddings, but it is slower.

LLM-Based Reranking

An LLM judges which chunks are most useful for answering the question. This is flexible, but cost, latency, and output stability must be controlled.

Score Fusion

Combines dense score, keyword score, metadata score, and other signals into one ranking score. It is cheaper than model-based reranking but needs tuning.

Rule-Based Boosting

Boosts or penalizes chunks based on metadata, source priority, freshness, document type, or exact term match. It is simple and explainable.

A production system may combine multiple methods.

For example, it may use hybrid retrieval first, then rule-based filtering, then cross-encoder reranking on the remaining candidates.

Cross-Encoder Reranking

Cross-encoder reranking compares the query and candidate chunk together.

Unlike embedding retrieval, where the query and chunk are embedded separately, a cross-encoder reads both at the same time.

Simplified input:

Query:
Can I cancel my monthly subscription and get a refund for this month?

Candidate:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.

The reranker then gives a relevance score.

When It Works

Cross-encoder reranking is useful when:

  • top retrieval results are noisy
  • the correct answer appears in top 20 or top 50 but not top 5
  • the question needs careful matching
  • chunks are short enough to score efficiently
  • accuracy matters more than minimum latency

Main Weakness

Cross-encoder reranking is slower than normal vector retrieval.

It must score each query-candidate pair. If you rerank 100 candidates for every request, latency and cost may increase.

A common pattern is to retrieve more candidates first, then rerank a limited number:

Retrieve top 50
→ Rerank top 50
→ Keep top 5

LLM-Based Reranking

LLM-based reranking asks an LLM to judge candidate chunks.

The model may receive the user question and a list of chunks, then return the most relevant chunk IDs.

Example instruction:

Select the chunks that directly help answer the user question.
Prefer chunks with exact policy conditions over general related text.
Return only the selected chunk IDs.

When It Works

LLM-based reranking is useful when relevance depends on reasoning, not just similarity.

It can help when:

  • chunks are complex
  • exact answerability matters
  • retrieved results contain subtle differences
  • business rules need interpretation
  • cross-encoder models are not available

Main Weakness

LLM-based reranking can be expensive and less deterministic.

It needs strict output format, small candidate sets, and evaluation. Otherwise, the reranker itself becomes another unstable LLM stage.

Use LLM-based reranking carefully. It should improve evidence selection, not become a second answer generator.

Score Fusion and Rule-Based Reranking

Not every reranker needs to be a model.

A practical system can improve ranking by combining multiple signals:

SignalExample
Dense similarityVector search score
Keyword scoreBM25 or full-text score
Metadata matchProduct, domain, language, permission
FreshnessLatest version or updated_at
Source priorityOfficial docs above old tickets
Exact term matchError code, API name, policy term

A simple scoring idea may look like this:

final_score =
  dense_score
  + keyword_score
  + metadata_boost
  + freshness_boost
  + source_priority_boost

This is not always mathematically perfect, but it is inspectable.

When It Works

Score fusion and rule-based reranking are useful when:

  • the system needs predictable behavior
  • metadata quality is good
  • exact terms matter
  • source priority is important
  • model-based reranking is too slow or expensive

Main Weakness

Manual scoring rules need tuning.

If the metadata boost is too strong, a weak chunk from the correct product may outrank a strong chunk from a better source. If freshness boost is too strong, a new but irrelevant document may rank too high.

Rules should be evaluated against real questions.

When Reranking Helps

Reranking helps only when retrieval already has enough recall.

The correct chunk must appear in the candidate list.

Retrieval ResultCan Reranking Help?Reason
Correct chunk is not in top 50NoReranker cannot score what it never receives
Correct chunk is in top 50 but not top 5YesReranker can move it up
Top results are semantically related but not answerableYesReranker can prefer answerable chunks
Metadata filter removed the correct chunkNoFiltering failure happens before reranking
Chunks are badly cutLimitedReranker cannot recreate missing context

This is why retrieval evaluation should check recall before reranking.

A good diagnostic question is:

Does the correct evidence appear anywhere in the retrieved candidate pool?

If yes, reranking may help. If no, fix indexing, chunking, metadata filtering, or retrieval first.

Select Reranking Based on Data Type

Different data types benefit from different reranking signals.

Data TypeUseful Reranking DirectionWhy
Policy documentsCross-encoder + metadata freshnessExact conditions and current version matter
API documentationKeyword score + exact endpoint boostFunction names and paths need exact match
Error code docsExact code boost firstExact code should dominate semantic similarity
Support ticketsDense score + recency + product matchSimilar wording and current product context matter
Legal documentsCross-encoder + parent contextSubtle wording and full context matter
Product manualsMetadata product/version boostWrong product version can produce wrong answer
Tables and specsTable-aware row relevanceCorrect row and column labels matter
Meeting notesDense score + date filterMeaning and timeline often matter

The reranking strategy should match the retrieval risk.

If the risk is wrong version, use metadata freshness. If the risk is wrong API endpoint, use exact term boost. If the risk is subtle answerability, use cross-encoder or LLM-based reranking.

Reusable Example: Reranking the Refund Policy

Use the same refund policy example from previous logs.

User question:

Can I cancel my monthly subscription and get a refund for this month?

Initial retrieval result:

[
  {
    "rank": 1,
    "section": "General Refund Rule",
    "text": "Customers can request a refund within 14 days after purchase if they have completed less than 20% of the course content."
  },
  {
    "rank": 2,
    "section": "Support Contact",
    "text": "For billing issues, customers should contact billing-support@learnpro.example."
  },
  {
    "rank": 3,
    "section": "Subscription Cancellation",
    "text": "Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month."
  }
]

The first chunk is related to refund, but it answers single-course refund rules. The user asked about monthly subscription cancellation.

A reranker should produce this order:

[
  {
    "rank": 1,
    "section": "Subscription Cancellation",
    "reason": "Directly answers monthly subscription cancellation and current month refund."
  },
  {
    "rank": 2,
    "section": "General Refund Rule",
    "reason": "Related to refund but applies to course purchase conditions."
  },
  {
    "rank": 3,
    "section": "Support Contact",
    "reason": "May be useful fallback but does not answer the policy question."
  }
]

After reranking, the LLM receives the most answerable evidence first.

Practical Reranking Pipeline

A practical reranking pipeline can be built gradually.

StageSetup
PrototypeDense or hybrid retrieval without reranking
First improvementRetrieve top 30 or 50, then rerank top 5
Metadata-heavy systemAdd rule-based boosts for product, version, and permission
Technical docsAdd keyword and exact-term boosts
Policy or legal docsAdd cross-encoder reranking
Complex reasoning casesConsider LLM-based reranking with strict output

Do not add reranking before you can inspect retrieval results.

First confirm:

  • the correct chunk exists in the index
  • the correct chunk appears in a larger top-k candidate set
  • the current top results are less answerable
  • reranking moves the correct chunk upward

Reranking should be measured by evidence quality, not by whether the final answer sounds better.

Common Mistakes

Reranking mistakes often happen because teams add it before diagnosing retrieval failure.

MistakeResult
Reranking top 5 onlyCorrect chunk may never enter the reranker
Using reranking to fix missing chunksIt cannot fix failed recall
Sending too many candidates to LLM rerankerHigh cost and unstable results
Ignoring metadata signalsWrong product, version, or source may rank high
Over-boosting rulesWeak chunks may beat truly relevant chunks
No before/after evaluationCannot prove reranking improved evidence quality
Treating reranker as answer generatorReranking stage becomes hard to control

The most useful evaluation is simple:

Before reranking: where is the correct chunk ranked?
After reranking: where is the correct chunk ranked?

If the correct chunk consistently moves upward, reranking is helping.

The Main Principle

Reranking is not retrieval replacement. It is candidate ordering correction.

Retrieval should bring enough possible evidence into the candidate pool. Reranking should make the most answerable evidence rise to the top. Context selection should send only the strongest chunks to the LLM.

The practical rule is simple: add reranking when the correct chunks are retrieved but not prioritized. If the correct chunks are not retrieved at all, fix indexing, chunking, metadata, or retrieval first.

Reranking 是 retrieval 之后、最终 LLM context selection 之前的阶段。Retrieval 负责找候选 chunks。Reranking 负责重新排序这些候选 chunks,让最有用的证据排到更前面。它不是用来取代 retrieval 的。它主要修复一个常见问题:正确 chunk 已经被检索到了,但排名不够靠前,所以最后没有被 LLM 使用。

简短答案

Reranking 会改善 retrieved chunks 的排序。

基础 RAG retrieval flow 可能是这样:

User question
→ Retrieval
→ Top 5 chunks
→ LLM answer

加入 reranking 后,流程通常是这样:

User question
→ Retrieval
→ Top 30 or Top 50 candidate chunks
→ Reranking
→ Best 5 chunks
→ LLM answer

Retrieval 通常优化的是快速找候选。Reranking 通常优化的是更仔细地判断相关性。

阶段主要任务
Retrieval快速找出可能证据
Reranking根据有用程度重排候选 chunks
Context selection只保留最好的 chunks 给 LLM
LLM generation把选中的证据转换成答案

当 retrieval 已经把正确 chunk 找出来,但正确 chunk 没有排在前面时,reranking 就很有用。

为什么需要 Reranking

Retrieval 通常是一个较宽的搜索步骤。

Dense retrieval、keyword retrieval 和 hybrid retrieval 都是为了快速找候选。但快速 retrieval score 不一定足够精准,不能直接决定最终要给 LLM 哪些 context。

例如用户问:

Can I cancel my monthly subscription and get a refund for this month?

Retrieval 可能返回:

RankRetrieved Chunk问题
1General Refund Rule和 refund 有关,但不是 subscription-specific
2Support Contact提到 billing support,但不是答案
3Subscription Cancellation正确答案
4Digital Course Activation和 refund 有关,但属于不同场景
5Enterprise Customers不同 customer type

正确 chunk 在 rank 3。如果系统只把 top 2 chunks 给 LLM,答案可能会错或不完整。

Reranking 的目的就是把 Subscription Cancellation 排到较弱候选之前。

Retrieval 和 Reranking 的区别

Retrieval 和 reranking 有关系,但解决的问题不同。

对比RetrievalReranking
Search space整个 index 或 filtered indexretrieved candidate list
主要目标recall possible evidenceimprove ordering precision
典型输入query 和 indexed chunksquery 和 candidate chunks
典型输出top-k candidatesreordered top-k candidates
成本每个 candidate 成本较低每个 candidate 成本较高
速度更快更慢
失败模式correct chunk 没被找到correct chunk 找到了但排名不好

Retrieval 应该最大化正确 evidence 进入 candidate pool 的机会。

Reranking 应该最大化最有用 evidence 出现在 final context 前几名的机会。

这两个阶段应该分开评估。如果 retrieval 从来没有返回正确 chunk,reranking 无法修复系统。

常见 Reranking 方法

Reranking 有几种常见实现方式。

Cross-Encoder Reranking

模型同时读取 query 和每个 candidate chunk,然后给出 relevance score。通常比单独比较 embedding 更准确,但速度更慢。

LLM-Based Reranking

让 LLM 判断哪些 chunks 最有助于回答问题。它灵活,但成本、延迟和输出稳定性需要控制。

Score Fusion

把 dense score、keyword score、metadata score 和其他 signal 合并成一个 ranking score。它比 model-based reranking 便宜,但需要 tuning。

Rule-Based Boosting

根据 metadata、source priority、freshness、document type 或 exact term match 来 boost 或 penalize chunks。简单,而且可解释。

生产系统可能会组合多种方式。

例如先做 hybrid retrieval,再做 rule-based filtering,最后对剩余 candidates 做 cross-encoder reranking。

Cross-Encoder Reranking

Cross-encoder reranking 会把 query 和 candidate chunk 放在一起比较。

不同于 embedding retrieval 把 query 和 chunk 分别 embed,cross-encoder 会同时读取两者。

简化输入:

Query:
Can I cancel my monthly subscription and get a refund for this month?

Candidate:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.

然后 reranker 给出 relevance score。

什么时候适合

Cross-encoder reranking 适合这些情况:

  • top retrieval results 噪音多
  • 正确答案在 top 20 或 top 50 里,但不在 top 5
  • 问题需要仔细匹配
  • chunks 足够短,可以有效评分
  • accuracy 比最低 latency 更重要

主要弱点

Cross-encoder reranking 比普通 vector retrieval 慢。

它必须对每个 query-candidate pair 打分。如果每次 request 都 rerank 100 个 candidates,latency 和 cost 会增加。

常见模式是先多取候选,再 rerank 有限数量:

Retrieve top 50
→ Rerank top 50
→ Keep top 5

LLM-Based Reranking

LLM-based reranking 会让 LLM 判断 candidate chunks。

模型可能接收用户问题和一组 chunks,然后返回最相关的 chunk IDs。

示例 instruction:

Select the chunks that directly help answer the user question.
Prefer chunks with exact policy conditions over general related text.
Return only the selected chunk IDs.

什么时候适合

当相关性判断需要 reasoning,而不只是 similarity 时,LLM-based reranking 可能有用。

它可以帮助这些情况:

  • chunks 很复杂
  • 需要判断是否真的 answerable
  • retrieved results 之间差异很细微
  • business rules 需要解释
  • 没有可用的 cross-encoder model

主要弱点

LLM-based reranking 可能昂贵,而且不够 deterministic。

它需要严格 output format、小 candidate set 和 evaluation。否则 reranker 本身会变成另一个不稳定的 LLM 阶段。

使用 LLM-based reranking 要谨慎。它应该改善 evidence selection,而不是变成第二个 answer generator。

Score Fusion 和 Rule-Based Reranking

不是所有 reranker 都需要模型。

一个实用系统可以通过合并多个 signal 来改善排序:

SignalExample
Dense similarityVector search score
Keyword scoreBM25 or full-text score
Metadata matchProduct, domain, language, permission
FreshnessLatest version or updated_at
Source priorityOfficial docs above old tickets
Exact term matchError code, API name, policy term

一个简单 scoring 想法可以是:

final_score =
  dense_score
  + keyword_score
  + metadata_boost
  + freshness_boost
  + source_priority_boost

这不一定数学上完美,但它可以检查。

什么时候适合

Score fusion 和 rule-based reranking 适合这些情况:

  • 系统需要可预测行为
  • metadata quality 好
  • exact terms 很重要
  • source priority 很重要
  • model-based reranking 太慢或太贵

主要弱点

手动 scoring rules 需要 tuning。

如果 metadata boost 太强,正确产品下的弱 chunk 可能排过更好来源的强 chunk。如果 freshness boost 太强,新的但不相关的文档可能排太高。

规则必须用真实问题评估。

什么时候 Reranking 有帮助

只有当 retrieval 已经有足够 recall 时,reranking 才有帮助。

正确 chunk 必须先出现在 candidate list 里。

Retrieval ResultReranking 能帮吗?原因
Correct chunk 不在 top 50不能reranker 无法评分它没收到的内容
Correct chunk 在 top 50,但不在 top 5可以reranker 可以把它往前排
Top results 语义相关但不能回答可以reranker 可以偏好 answerable chunks
Metadata filter 移除了 correct chunk不能filtering failure 发生在 reranking 之前
Chunks 切得很差有限reranker 无法重建缺失上下文

所以 retrieval evaluation 应该先检查 recall,再检查 reranking。

一个好的诊断问题是:

正确 evidence 有没有出现在 retrieved candidate pool 里面?

如果有,reranking 可能有帮助。如果没有,先修 indexing、chunking、metadata filtering 或 retrieval。

根据 Data Type 选择 Reranking

不同数据类型适合不同的 reranking signal。

Data Type有用的 Reranking 方向原因
Policy documentsCross-encoder + metadata freshness精确条件和当前版本很重要
API documentationKeyword score + exact endpoint boostfunction names 和 paths 需要精确匹配
Error code docsExact code boost firstexact code 应该压过 semantic similarity
Support ticketsDense score + recency + product match相似 wording 和当前 product context 重要
Legal documentsCross-encoder + parent context细微 wording 和完整上下文很重要
Product manualsMetadata product/version boost错误产品版本会产生错误答案
Tables and specsTable-aware row relevance正确 row 和 column labels 很重要
Meeting notesDense score + date filtermeaning 和 timeline 通常都重要

Reranking strategy 应该匹配 retrieval risk。

如果风险是错误版本,就使用 metadata freshness。如果风险是错误 API endpoint,就使用 exact term boost。如果风险是细微 answerability,就用 cross-encoder 或 LLM-based reranking。

可复用例子:Reranking Refund Policy

继续使用前几篇 log 的 refund policy 例子。

用户问题:

Can I cancel my monthly subscription and get a refund for this month?

初始 retrieval result:

[
  {
    "rank": 1,
    "section": "General Refund Rule",
    "text": "Customers can request a refund within 14 days after purchase if they have completed less than 20% of the course content."
  },
  {
    "rank": 2,
    "section": "Support Contact",
    "text": "For billing issues, customers should contact billing-support@learnpro.example."
  },
  {
    "rank": 3,
    "section": "Subscription Cancellation",
    "text": "Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month."
  }
]

第一个 chunk 和 refund 有关,但它回答的是 single-course refund rule。用户问的是 monthly subscription cancellation。

reranker 应该产出这个顺序:

[
  {
    "rank": 1,
    "section": "Subscription Cancellation",
    "reason": "Directly answers monthly subscription cancellation and current month refund."
  },
  {
    "rank": 2,
    "section": "General Refund Rule",
    "reason": "Related to refund but applies to course purchase conditions."
  },
  {
    "rank": 3,
    "section": "Support Contact",
    "reason": "May be useful fallback but does not answer the policy question."
  }
]

reranking 后,LLM 会优先拿到最能回答问题的 evidence。

实用 Reranking Pipeline

Reranking pipeline 可以逐步构建。

阶段Setup
PrototypeDense or hybrid retrieval without reranking
First improvementRetrieve top 30 or 50, then rerank top 5
Metadata-heavy systemAdd rule-based boosts for product, version, and permission
Technical docsAdd keyword and exact-term boosts
Policy or legal docsAdd cross-encoder reranking
Complex reasoning casesConsider LLM-based reranking with strict output

在你能检查 retrieval results 之前,不要急着加 reranking。

先确认:

  • correct chunk 存在于 index
  • correct chunk 会出现在较大的 top-k candidate set
  • 当前 top results 的 answerability 比较弱
  • reranking 会把 correct chunk 往前移动

Reranking 应该用 evidence quality 评估,而不是只看最终答案听起来是否更好。

常见错误

Reranking 的错误通常来自团队还没诊断 retrieval failure 就加入 reranking。

错误结果
只 rerank top 5correct chunk 可能根本没进入 reranker
用 reranking 修 missing chunks它不能修 failed recall
给 LLM reranker 太多 candidates成本高,结果不稳定
忽略 metadata signals错误 product、version 或 source 可能排很高
规则 boost 太强弱 chunk 可能超过真正相关 chunk
没有 before/after evaluation无法证明 reranking 改善了 evidence quality
把 reranker 当 answer generatorreranking 阶段变得难控制

最有用的评估很简单:

Before reranking: correct chunk 排第几?
After reranking: correct chunk 排第几?

如果 correct chunk 稳定往前移动,reranking 就有帮助。

核心原则

Reranking 不是 retrieval replacement。它是 candidate ordering correction。

Retrieval 应该把足够多可能 evidence 带进 candidate pool。Reranking 应该让最能回答问题的 evidence 排到前面。Context selection 应该只把最强 chunks 送给 LLM。

实用规则很简单:当 correct chunks 已经被 retrieved,但没有被优先排序时,加 reranking。如果 correct chunks 根本没有被 retrieved,先修 indexing、chunking、metadata 或 retrieval。

In this series

Step By Step Build Your RAG

View series ->

Part 7 of 9. Move between logs in the same learning sequence.