RAG Reranking

Reranking is the stage after retrieval and before final LLM context selection. Retrieval finds candidate chunks. Reranking reorders those candidates so the most useful evidence moves closer to the top. It does not replace retrieval. It fixes one common retrieval problem: the right chunk may be retrieved, but not ranked high enough to be used.

Short Answer

Reranking improves the order of retrieved chunks.

A basic RAG retrieval flow may look like this:

User question
→ Retrieval
→ Top 5 chunks
→ LLM answer

A reranking-based flow usually looks like this:

User question
→ Retrieval
→ Top 30 or Top 50 candidate chunks
→ Reranking
→ Best 5 chunks
→ LLM answer

Retrieval is usually optimized for fast candidate search. Reranking is usually optimized for more careful relevance judgment.

Stage	Main Job
Retrieval	Quickly find possible evidence
Reranking	Reorder candidates by usefulness
Context selection	Keep only the best chunks for the LLM
LLM generation	Convert selected evidence into an answer

Reranking is useful when retrieval returns the correct chunk somewhere in the candidate list, but the correct chunk is not near the top.

Why Reranking Exists

Retrieval is usually a broad search step.

Dense retrieval, keyword retrieval, and hybrid retrieval are designed to find candidates quickly. But fast retrieval scores are not always precise enough for final answer selection.

For example, the user asks:

Can I cancel my monthly subscription and get a refund for this month?

Retrieval may return these chunks:

Rank	Retrieved Chunk	Problem
1	General Refund Rule	Related to refund, but not subscription-specific
2	Support Contact	Mentions billing support, but not the answer
3	Subscription Cancellation	Correct answer
4	Digital Course Activation	Related to refund, but different case
5	Enterprise Customers	Different customer type

The correct chunk exists at rank 3. If the system only sends top 2 chunks to the LLM, the answer may be wrong or incomplete.

Reranking tries to move Subscription Cancellation above weaker candidates.

Retrieval vs Reranking

Retrieval and reranking are related, but they solve different problems.

Comparison	Retrieval	Reranking
Search space	Whole index or filtered index	Retrieved candidate list
Main goal	Recall possible evidence	Improve ordering precision
Typical input	Query and indexed chunks	Query and candidate chunks
Typical output	Top-k candidates	Reordered top-k candidates
Cost	Lower per candidate	Higher per candidate
Speed	Faster	Slower
Failure mode	Correct chunk not found	Correct chunk found but ranked poorly

Retrieval should maximize the chance that the correct evidence enters the candidate pool.

Reranking should maximize the chance that the best evidence appears near the top of the final context.

The two stages should be evaluated separately. If retrieval never returns the correct chunk, reranking cannot fix the system.

Common Reranking Methods

Reranking can be implemented in several ways.

Cross-Encoder Reranking

A model reads the query and each candidate chunk together, then scores relevance. This is usually more accurate than comparing separate embeddings, but it is slower.

LLM-Based Reranking

An LLM judges which chunks are most useful for answering the question. This is flexible, but cost, latency, and output stability must be controlled.

Score Fusion

Combines dense score, keyword score, metadata score, and other signals into one ranking score. It is cheaper than model-based reranking but needs tuning.

Rule-Based Boosting

Boosts or penalizes chunks based on metadata, source priority, freshness, document type, or exact term match. It is simple and explainable.

A production system may combine multiple methods.

For example, it may use hybrid retrieval first, then rule-based filtering, then cross-encoder reranking on the remaining candidates.

Cross-Encoder Reranking

Cross-encoder reranking compares the query and candidate chunk together.

Unlike embedding retrieval, where the query and chunk are embedded separately, a cross-encoder reads both at the same time.

Simplified input:

Query:
Can I cancel my monthly subscription and get a refund for this month?

Candidate:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.

The reranker then gives a relevance score.

When It Works

Cross-encoder reranking is useful when:

top retrieval results are noisy
the correct answer appears in top 20 or top 50 but not top 5
the question needs careful matching
chunks are short enough to score efficiently
accuracy matters more than minimum latency

Main Weakness

Cross-encoder reranking is slower than normal vector retrieval.

It must score each query-candidate pair. If you rerank 100 candidates for every request, latency and cost may increase.

A common pattern is to retrieve more candidates first, then rerank a limited number:

Retrieve top 50
→ Rerank top 50
→ Keep top 5

LLM-Based Reranking

LLM-based reranking asks an LLM to judge candidate chunks.

The model may receive the user question and a list of chunks, then return the most relevant chunk IDs.

Example instruction:

Select the chunks that directly help answer the user question.
Prefer chunks with exact policy conditions over general related text.
Return only the selected chunk IDs.

When It Works

LLM-based reranking is useful when relevance depends on reasoning, not just similarity.

It can help when:

chunks are complex
exact answerability matters
retrieved results contain subtle differences
business rules need interpretation
cross-encoder models are not available

Main Weakness

LLM-based reranking can be expensive and less deterministic.

It needs strict output format, small candidate sets, and evaluation. Otherwise, the reranker itself becomes another unstable LLM stage.

Use LLM-based reranking carefully. It should improve evidence selection, not become a second answer generator.

Score Fusion and Rule-Based Reranking

Not every reranker needs to be a model.

A practical system can improve ranking by combining multiple signals:

Signal	Example
Dense similarity	Vector search score
Keyword score	BM25 or full-text score
Metadata match	Product, domain, language, permission
Freshness	Latest version or updated_at
Source priority	Official docs above old tickets
Exact term match	Error code, API name, policy term

A simple scoring idea may look like this:

final_score =
  dense_score
  + keyword_score
  + metadata_boost
  + freshness_boost
  + source_priority_boost

This is not always mathematically perfect, but it is inspectable.

When It Works

Score fusion and rule-based reranking are useful when:

the system needs predictable behavior
metadata quality is good
exact terms matter
source priority is important
model-based reranking is too slow or expensive

Main Weakness

Manual scoring rules need tuning.

If the metadata boost is too strong, a weak chunk from the correct product may outrank a strong chunk from a better source. If freshness boost is too strong, a new but irrelevant document may rank too high.

Rules should be evaluated against real questions.

When Reranking Helps

Reranking helps only when retrieval already has enough recall.

The correct chunk must appear in the candidate list.

Retrieval Result	Can Reranking Help?	Reason
Correct chunk is not in top 50	No	Reranker cannot score what it never receives
Correct chunk is in top 50 but not top 5	Yes	Reranker can move it up
Top results are semantically related but not answerable	Yes	Reranker can prefer answerable chunks
Metadata filter removed the correct chunk	No	Filtering failure happens before reranking
Chunks are badly cut	Limited	Reranker cannot recreate missing context

This is why retrieval evaluation should check recall before reranking.

A good diagnostic question is:

Does the correct evidence appear anywhere in the retrieved candidate pool?

If yes, reranking may help. If no, fix indexing, chunking, metadata filtering, or retrieval first.

Select Reranking Based on Data Type

Different data types benefit from different reranking signals.

Data Type	Useful Reranking Direction	Why
Policy documents	Cross-encoder + metadata freshness	Exact conditions and current version matter
API documentation	Keyword score + exact endpoint boost	Function names and paths need exact match
Error code docs	Exact code boost first	Exact code should dominate semantic similarity
Support tickets	Dense score + recency + product match	Similar wording and current product context matter
Legal documents	Cross-encoder + parent context	Subtle wording and full context matter
Product manuals	Metadata product/version boost	Wrong product version can produce wrong answer
Tables and specs	Table-aware row relevance	Correct row and column labels matter
Meeting notes	Dense score + date filter	Meaning and timeline often matter

The reranking strategy should match the retrieval risk.

If the risk is wrong version, use metadata freshness. If the risk is wrong API endpoint, use exact term boost. If the risk is subtle answerability, use cross-encoder or LLM-based reranking.

Reusable Example: Reranking the Refund Policy

Use the same refund policy example from previous logs.

User question:

Can I cancel my monthly subscription and get a refund for this month?

Initial retrieval result:

[
  {
    "rank": 1,
    "section": "General Refund Rule",
    "text": "Customers can request a refund within 14 days after purchase if they have completed less than 20% of the course content."
  },
  {
    "rank": 2,
    "section": "Support Contact",
    "text": "For billing issues, customers should contact billing-support@learnpro.example."
  },
  {
    "rank": 3,
    "section": "Subscription Cancellation",
    "text": "Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month."
  }
]

The first chunk is related to refund, but it answers single-course refund rules. The user asked about monthly subscription cancellation.

A reranker should produce this order:

[
  {
    "rank": 1,
    "section": "Subscription Cancellation",
    "reason": "Directly answers monthly subscription cancellation and current month refund."
  },
  {
    "rank": 2,
    "section": "General Refund Rule",
    "reason": "Related to refund but applies to course purchase conditions."
  },
  {
    "rank": 3,
    "section": "Support Contact",
    "reason": "May be useful fallback but does not answer the policy question."
  }
]

After reranking, the LLM receives the most answerable evidence first.

Practical Reranking Pipeline

A practical reranking pipeline can be built gradually.

Stage	Setup
Prototype	Dense or hybrid retrieval without reranking
First improvement	Retrieve top 30 or 50, then rerank top 5
Metadata-heavy system	Add rule-based boosts for product, version, and permission
Technical docs	Add keyword and exact-term boosts
Policy or legal docs	Add cross-encoder reranking
Complex reasoning cases	Consider LLM-based reranking with strict output

Do not add reranking before you can inspect retrieval results.

First confirm:

the correct chunk exists in the index
the correct chunk appears in a larger top-k candidate set
the current top results are less answerable
reranking moves the correct chunk upward

Reranking should be measured by evidence quality, not by whether the final answer sounds better.

Common Mistakes

Reranking mistakes often happen because teams add it before diagnosing retrieval failure.

Mistake	Result
Reranking top 5 only	Correct chunk may never enter the reranker
Using reranking to fix missing chunks	It cannot fix failed recall
Sending too many candidates to LLM reranker	High cost and unstable results
Ignoring metadata signals	Wrong product, version, or source may rank high
Over-boosting rules	Weak chunks may beat truly relevant chunks
No before/after evaluation	Cannot prove reranking improved evidence quality
Treating reranker as answer generator	Reranking stage becomes hard to control

The most useful evaluation is simple:

Before reranking: where is the correct chunk ranked?
After reranking: where is the correct chunk ranked?

If the correct chunk consistently moves upward, reranking is helping.

The Main Principle

Reranking is not retrieval replacement. It is candidate ordering correction.

Retrieval should bring enough possible evidence into the candidate pool. Reranking should make the most answerable evidence rise to the top. Context selection should send only the strongest chunks to the LLM.

The practical rule is simple: add reranking when the correct chunks are retrieved but not prioritized. If the correct chunks are not retrieved at all, fix indexing, chunking, metadata, or retrieval first.

Reranking 是 retrieval 之后、最终 LLM context selection 之前的阶段。Retrieval 负责找候选 chunks。Reranking 负责重新排序这些候选 chunks，让最有用的证据排到更前面。它不是用来取代 retrieval 的。它主要修复一个常见问题：正确 chunk 已经被检索到了，但排名不够靠前，所以最后没有被 LLM 使用。

简短答案

Reranking 会改善 retrieved chunks 的排序。

基础 RAG retrieval flow 可能是这样：

User question
→ Retrieval
→ Top 5 chunks
→ LLM answer

加入 reranking 后，流程通常是这样：

User question
→ Retrieval
→ Top 30 or Top 50 candidate chunks
→ Reranking
→ Best 5 chunks
→ LLM answer

Retrieval 通常优化的是快速找候选。Reranking 通常优化的是更仔细地判断相关性。

阶段	主要任务
Retrieval	快速找出可能证据
Reranking	根据有用程度重排候选 chunks
Context selection	只保留最好的 chunks 给 LLM
LLM generation	把选中的证据转换成答案

当 retrieval 已经把正确 chunk 找出来，但正确 chunk 没有排在前面时，reranking 就很有用。

为什么需要 Reranking

Retrieval 通常是一个较宽的搜索步骤。

Dense retrieval、keyword retrieval 和 hybrid retrieval 都是为了快速找候选。但快速 retrieval score 不一定足够精准，不能直接决定最终要给 LLM 哪些 context。

例如用户问：

Can I cancel my monthly subscription and get a refund for this month?

Retrieval 可能返回：

Rank	Retrieved Chunk	问题
1	General Refund Rule	和 refund 有关，但不是 subscription-specific
2	Support Contact	提到 billing support，但不是答案
3	Subscription Cancellation	正确答案
4	Digital Course Activation	和 refund 有关，但属于不同场景
5	Enterprise Customers	不同 customer type

正确 chunk 在 rank 3。如果系统只把 top 2 chunks 给 LLM，答案可能会错或不完整。

Reranking 的目的就是把 Subscription Cancellation 排到较弱候选之前。

Retrieval 和 Reranking 的区别

Retrieval 和 reranking 有关系，但解决的问题不同。

对比	Retrieval	Reranking
Search space	整个 index 或 filtered index	retrieved candidate list
主要目标	recall possible evidence	improve ordering precision
典型输入	query 和 indexed chunks	query 和 candidate chunks
典型输出	top-k candidates	reordered top-k candidates
成本	每个 candidate 成本较低	每个 candidate 成本较高
速度	更快	更慢
失败模式	correct chunk 没被找到	correct chunk 找到了但排名不好

Retrieval 应该最大化正确 evidence 进入 candidate pool 的机会。

Reranking 应该最大化最有用 evidence 出现在 final context 前几名的机会。

这两个阶段应该分开评估。如果 retrieval 从来没有返回正确 chunk，reranking 无法修复系统。

常见 Reranking 方法

Reranking 有几种常见实现方式。

Cross-Encoder Reranking

模型同时读取 query 和每个 candidate chunk，然后给出 relevance score。通常比单独比较 embedding 更准确，但速度更慢。

LLM-Based Reranking

让 LLM 判断哪些 chunks 最有助于回答问题。它灵活，但成本、延迟和输出稳定性需要控制。

Score Fusion

把 dense score、keyword score、metadata score 和其他 signal 合并成一个 ranking score。它比 model-based reranking 便宜，但需要 tuning。

Rule-Based Boosting

根据 metadata、source priority、freshness、document type 或 exact term match 来 boost 或 penalize chunks。简单，而且可解释。

生产系统可能会组合多种方式。

例如先做 hybrid retrieval，再做 rule-based filtering，最后对剩余 candidates 做 cross-encoder reranking。

Cross-Encoder Reranking

Cross-encoder reranking 会把 query 和 candidate chunk 放在一起比较。

不同于 embedding retrieval 把 query 和 chunk 分别 embed，cross-encoder 会同时读取两者。

简化输入：

Query:
Can I cancel my monthly subscription and get a refund for this month?

Candidate:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.

然后 reranker 给出 relevance score。

什么时候适合

Cross-encoder reranking 适合这些情况：

top retrieval results 噪音多
正确答案在 top 20 或 top 50 里，但不在 top 5
问题需要仔细匹配
chunks 足够短，可以有效评分
accuracy 比最低 latency 更重要

主要弱点

Cross-encoder reranking 比普通 vector retrieval 慢。

它必须对每个 query-candidate pair 打分。如果每次 request 都 rerank 100 个 candidates，latency 和 cost 会增加。

常见模式是先多取候选，再 rerank 有限数量：

Retrieve top 50
→ Rerank top 50
→ Keep top 5

LLM-Based Reranking

LLM-based reranking 会让 LLM 判断 candidate chunks。

模型可能接收用户问题和一组 chunks，然后返回最相关的 chunk IDs。

示例 instruction：

Select the chunks that directly help answer the user question.
Prefer chunks with exact policy conditions over general related text.
Return only the selected chunk IDs.

什么时候适合

当相关性判断需要 reasoning，而不只是 similarity 时，LLM-based reranking 可能有用。

它可以帮助这些情况：

chunks 很复杂
需要判断是否真的 answerable
retrieved results 之间差异很细微
business rules 需要解释
没有可用的 cross-encoder model

主要弱点

LLM-based reranking 可能昂贵，而且不够 deterministic。

它需要严格 output format、小 candidate set 和 evaluation。否则 reranker 本身会变成另一个不稳定的 LLM 阶段。

使用 LLM-based reranking 要谨慎。它应该改善 evidence selection，而不是变成第二个 answer generator。

Score Fusion 和 Rule-Based Reranking

不是所有 reranker 都需要模型。

一个实用系统可以通过合并多个 signal 来改善排序：

Signal	Example
Dense similarity	Vector search score
Keyword score	BM25 or full-text score
Metadata match	Product, domain, language, permission
Freshness	Latest version or updated_at
Source priority	Official docs above old tickets
Exact term match	Error code, API name, policy term

一个简单 scoring 想法可以是：

final_score =
  dense_score
  + keyword_score
  + metadata_boost
  + freshness_boost
  + source_priority_boost

这不一定数学上完美，但它可以检查。

什么时候适合

Score fusion 和 rule-based reranking 适合这些情况：

系统需要可预测行为
metadata quality 好
exact terms 很重要
source priority 很重要
model-based reranking 太慢或太贵

主要弱点

手动 scoring rules 需要 tuning。

如果 metadata boost 太强，正确产品下的弱 chunk 可能排过更好来源的强 chunk。如果 freshness boost 太强，新的但不相关的文档可能排太高。

规则必须用真实问题评估。

什么时候 Reranking 有帮助

只有当 retrieval 已经有足够 recall 时，reranking 才有帮助。

正确 chunk 必须先出现在 candidate list 里。

Retrieval Result	Reranking 能帮吗？	原因
Correct chunk 不在 top 50	不能	reranker 无法评分它没收到的内容
Correct chunk 在 top 50，但不在 top 5	可以	reranker 可以把它往前排
Top results 语义相关但不能回答	可以	reranker 可以偏好 answerable chunks
Metadata filter 移除了 correct chunk	不能	filtering failure 发生在 reranking 之前
Chunks 切得很差	有限	reranker 无法重建缺失上下文

所以 retrieval evaluation 应该先检查 recall，再检查 reranking。

一个好的诊断问题是：

正确 evidence 有没有出现在 retrieved candidate pool 里面？

如果有，reranking 可能有帮助。如果没有，先修 indexing、chunking、metadata filtering 或 retrieval。

根据 Data Type 选择 Reranking

不同数据类型适合不同的 reranking signal。

Data Type	有用的 Reranking 方向	原因
Policy documents	Cross-encoder + metadata freshness	精确条件和当前版本很重要
API documentation	Keyword score + exact endpoint boost	function names 和 paths 需要精确匹配
Error code docs	Exact code boost first	exact code 应该压过 semantic similarity
Support tickets	Dense score + recency + product match	相似 wording 和当前 product context 重要
Legal documents	Cross-encoder + parent context	细微 wording 和完整上下文很重要
Product manuals	Metadata product/version boost	错误产品版本会产生错误答案
Tables and specs	Table-aware row relevance	正确 row 和 column labels 很重要
Meeting notes	Dense score + date filter	meaning 和 timeline 通常都重要

Reranking strategy 应该匹配 retrieval risk。

如果风险是错误版本，就使用 metadata freshness。如果风险是错误 API endpoint，就使用 exact term boost。如果风险是细微 answerability，就用 cross-encoder 或 LLM-based reranking。

可复用例子：Reranking Refund Policy

继续使用前几篇 log 的 refund policy 例子。

用户问题：

Can I cancel my monthly subscription and get a refund for this month?

初始 retrieval result：

[
  {
    "rank": 1,
    "section": "General Refund Rule",
    "text": "Customers can request a refund within 14 days after purchase if they have completed less than 20% of the course content."
  },
  {
    "rank": 2,
    "section": "Support Contact",
    "text": "For billing issues, customers should contact billing-support@learnpro.example."
  },
  {
    "rank": 3,
    "section": "Subscription Cancellation",
    "text": "Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month."
  }
]

第一个 chunk 和 refund 有关，但它回答的是 single-course refund rule。用户问的是 monthly subscription cancellation。

reranker 应该产出这个顺序：

[
  {
    "rank": 1,
    "section": "Subscription Cancellation",
    "reason": "Directly answers monthly subscription cancellation and current month refund."
  },
  {
    "rank": 2,
    "section": "General Refund Rule",
    "reason": "Related to refund but applies to course purchase conditions."
  },
  {
    "rank": 3,
    "section": "Support Contact",
    "reason": "May be useful fallback but does not answer the policy question."
  }
]

reranking 后，LLM 会优先拿到最能回答问题的 evidence。

实用 Reranking Pipeline

Reranking pipeline 可以逐步构建。

阶段	Setup
Prototype	Dense or hybrid retrieval without reranking
First improvement	Retrieve top 30 or 50, then rerank top 5
Metadata-heavy system	Add rule-based boosts for product, version, and permission
Technical docs	Add keyword and exact-term boosts
Policy or legal docs	Add cross-encoder reranking
Complex reasoning cases	Consider LLM-based reranking with strict output

在你能检查 retrieval results 之前，不要急着加 reranking。

先确认：

correct chunk 存在于 index
correct chunk 会出现在较大的 top-k candidate set
当前 top results 的 answerability 比较弱
reranking 会把 correct chunk 往前移动

Reranking 应该用 evidence quality 评估，而不是只看最终答案听起来是否更好。

常见错误

Reranking 的错误通常来自团队还没诊断 retrieval failure 就加入 reranking。

错误	结果
只 rerank top 5	correct chunk 可能根本没进入 reranker
用 reranking 修 missing chunks	它不能修 failed recall
给 LLM reranker 太多 candidates	成本高，结果不稳定
忽略 metadata signals	错误 product、version 或 source 可能排很高
规则 boost 太强	弱 chunk 可能超过真正相关 chunk
没有 before/after evaluation	无法证明 reranking 改善了 evidence quality
把 reranker 当 answer generator	reranking 阶段变得难控制

最有用的评估很简单：

Before reranking: correct chunk 排第几？
After reranking: correct chunk 排第几？

如果 correct chunk 稳定往前移动，reranking 就有帮助。

核心原则

Reranking 不是 retrieval replacement。它是 candidate ordering correction。

Retrieval 应该把足够多可能 evidence 带进 candidate pool。Reranking 应该让最能回答问题的 evidence 排到前面。Context selection 应该只把最强 chunks 送给 LLM。

实用规则很简单：当 correct chunks 已经被 retrieved，但没有被优先排序时，加 reranking。如果 correct chunks 根本没有被 retrieved，先修 indexing、chunking、metadata 或 retrieval。

Short Answer

Why Reranking Exists

Retrieval vs Reranking

Common Reranking Methods

Cross-Encoder Reranking

LLM-Based Reranking

Score Fusion

Rule-Based Boosting

Cross-Encoder Reranking

When It Works

Main Weakness

LLM-Based Reranking

When It Works

Main Weakness

Score Fusion and Rule-Based Reranking

When It Works

Main Weakness

When Reranking Helps

Select Reranking Based on Data Type

Reusable Example: Reranking the Refund Policy

Practical Reranking Pipeline

Common Mistakes

The Main Principle

简短答案

为什么需要 Reranking

Retrieval 和 Reranking 的区别

常见 Reranking 方法

Cross-Encoder Reranking

LLM-Based Reranking

Score Fusion

Rule-Based Boosting

Cross-Encoder Reranking

什么时候适合

主要弱点

LLM-Based Reranking

什么时候适合

主要弱点

Score Fusion 和 Rule-Based Reranking

什么时候适合

主要弱点

什么时候 Reranking 有帮助

根据 Data Type 选择 Reranking

可复用例子：Reranking Refund Policy

实用 Reranking Pipeline

常见错误

核心原则

Step By Step Build Your RAG