After indexing, retrieval, and reranking, the LLM receives selected context and generates the final answer. This stage is not only about choosing a stronger model. Prompt design controls how the model reads evidence, handles missing information, cites sources, and avoids answering beyond the retrieved context.
Short Answer
A RAG prompt should tell the model five things:
1. Role
What kind of assistant it should behave as, such as support assistant, technical assistant, or internal knowledge assistant.
2. Evidence
Which retrieved chunks are available, where they came from, and what metadata or source fields matter.
3. Grounding Rule
Whether the model must answer only from context or can add limited general reasoning.
4. Unknown Handling
What the model should do when the retrieved evidence is missing, weak, outdated, or conflicting.
5. Output Format
How the final answer should be structured so it is useful for users and stable for the application.
The prompt should not try to hide retrieval problems. If the evidence is not enough, the model should say that the context is insufficient instead of inventing an answer.
Why Prompt Design Matters in RAG
In normal chat, the model can answer from its internal knowledge.
In RAG, the model should answer from retrieved evidence.
That means the prompt must define the boundary between:
| Source | How the Model Should Treat It |
|---|---|
| Retrieved context | Primary evidence |
| User question | Intent and constraints |
| Metadata | Scope, version, product, permission, source |
| Model knowledge | Secondary reasoning only when allowed |
| Missing evidence | Should trigger uncertainty, not hallucination |
Without prompt rules, the model may use retrieved context loosely. It may answer from memory, merge unrelated chunks, ignore version metadata, or sound confident when the evidence is not enough.
Prompt design is the control layer for generation behavior.
Core Prompt Components
A practical RAG prompt usually has several sections.
System role
→ Answer rules
→ Citation rules
→ Context block
→ User question
→ Output format
Each section has a job.
| Prompt Part | Purpose |
|---|---|
| System role | Sets behavior and domain |
| Answer rules | Defines what the model can and cannot do |
| Context block | Supplies retrieved evidence |
| Source metadata | Provides scope and traceability |
| Unknown handling | Prevents unsupported answers |
| Output format | Makes the response predictable |
| Citation rules | Links answer claims back to evidence |
A good prompt is not necessarily long. It is explicit about the decisions that matter.
Role Design
The role tells the model what type of work it is doing.
Weak role:
You are a helpful assistant.
Better RAG role:
You are a support knowledge assistant. Answer user questions using only the provided retrieved context. Prefer direct policy evidence over general explanation. If the context is insufficient, say what is missing.
The better role defines:
- domain
- source of truth
- answer behavior
- uncertainty behavior
The role should be specific enough to constrain the model, but not so detailed that it becomes hard to maintain.
Context Formatting
Context formatting is one of the most important parts of RAG prompt design.
Do not send chunks as anonymous paragraphs. Include structure and metadata that help the model understand scope.
Weak context:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.
Better context:
[Chunk ID: doc_refund_policy_learnpro_2026_04__subscription_cancellation]
Source: Refund and Cancellation Policy
Product: LearnPro Online Course Platform
Version: 2026.04
Section: Subscription Cancellation
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.
This helps the model understand that the rule is about a specific product, version, and section.
The LLM does not need every metadata field. It needs the fields that affect answer correctness.
Useful fields often include:
- title
- section
- product
- version
- language
- source URI or source ID
- updated date
- permission-safe citation label
Grounding Rules
Grounding rules define how strictly the model must follow retrieved context.
There are three common levels.
| Grounding Level | Behavior | Use Case |
|---|---|---|
| Strict grounding | Answer only from retrieved context | Policy, legal, compliance, support |
| Context-first grounding | Use context first, add limited explanation | Technical education, internal docs |
| Open grounding | Use retrieved context plus general model knowledge | Broad research or brainstorming |
For most RAG systems, strict or context-first grounding is safer.
Strict grounding example:
Answer only using the provided context. Do not use outside knowledge. If the context does not contain the answer, say that the provided context is insufficient.
Context-first grounding example:
Use the provided context as the primary source. You may add brief general explanation only when it does not change the policy, facts, or source-specific answer.
The stricter the domain, the stricter the grounding should be.
Unknown Handling
Unknown handling tells the model what to do when the context is not enough.
Without this rule, the model may fill gaps.
A good unknown rule:
If the context does not contain enough evidence to answer, say:
"The provided context does not contain enough information to answer this."
Then explain which missing information would be needed.
This is important because RAG failures often happen before the LLM stage.
| Failure | What the LLM Should Do |
|---|---|
| Correct chunk not retrieved | Say context is insufficient |
| Retrieved chunks conflict | Explain the conflict and avoid a single confident answer |
| Version is unclear | State that the applicable version is unclear |
| Product scope is missing | Ask for or state the missing product scope |
| Permission-limited context | Answer only from allowed context |
Unknown handling is not a weakness. It is how the system avoids unsupported answers.
Citation Design
Citation design decides how the final answer references evidence.
In RAG, citations are not decoration. They are part of traceability.
A useful citation source should be stable.
| Citation Field | Purpose |
|---|---|
| chunk_id | Debugging and internal trace |
| document_id | Stable document reference |
| title | Human-readable source |
| section | Explains where the answer came from |
| source_uri | Link or source location |
| version | Shows applicable document version |
The prompt should tell the model how to cite.
Example:
When making a factual claim from the context, cite the source using this format:
(Source: <title>, Section: <section>, Version: <version>)
For user-facing systems, you may not want to expose internal chunk IDs. But internally, the system should still log them.
A good answer can be user-friendly while still traceable.
Output Format Design
Output format controls response stability.
For a human chat interface, the output can be natural but structured.
Example:
Answer:
You can cancel your monthly subscription at any time, but the cancellation only stops the next billing cycle. The current active month is not refunded.
Source:
Refund and Cancellation Policy, Subscription Cancellation, Version 2026.04
For an application, JSON may be better.
{
"answer": "You can cancel your monthly subscription at any time, but the current active month is not refunded.",
"confidence": "supported",
"sources": [
{
"title": "Refund and Cancellation Policy",
"section": "Subscription Cancellation",
"version": "2026.04"
}
],
"missing_information": []
}
Use structured output when the application needs to parse the answer.
Use natural output when the answer is only displayed to a human.
A Practical RAG Prompt Template
This is a reusable prompt template for the refund policy example.
You are a support knowledge assistant.
Answer the user's question using only the provided context.
Rules:
- Use the context as the source of truth.
- Do not invent policy details.
- If the context is insufficient, say the provided context does not contain enough information.
- If sources conflict, explain the conflict instead of choosing silently.
- Prefer specific policy sections over general related sections.
- Keep the answer direct and practical.
- Include source information after the answer.
Context:
[Chunk ID: doc_refund_policy_learnpro_2026_04__subscription_cancellation]
Title: Refund and Cancellation Policy
Product: LearnPro Online Course Platform
Version: 2026.04
Section: Subscription Cancellation
Text:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.
User question:
Can I cancel my monthly subscription and get a refund for this month?
Output format:
Answer:
<direct answer>
Source:
<title>, <section>, <version>
Expected answer:
Answer:
You can cancel your monthly subscription at any time, but it will only stop the next billing cycle. The current active month is not refunded.
Source:
Refund and Cancellation Policy, Subscription Cancellation, Version 2026.04
This prompt is simple, but it controls the important behavior: source of truth, uncertainty, conflict handling, and output format.
Prompt Design Based on Use Case
Different RAG systems need different prompt behavior.
| Use Case | Prompt Style | Reason |
|---|---|---|
| Customer support | Strict, concise, source-backed | Avoid unsupported policy claims |
| Internal knowledge base | Context-first, explanatory | Users may need reasoning and process |
| Legal or compliance | Very strict, citation-heavy | Exact wording and source version matter |
| Developer documentation | Technical, example-oriented | Users need code, constraints, and edge cases |
| Education assistant | Context-first, teaching style | Explanation matters, but source should guide it |
| Operations runbook | Direct, step-based | Users need safe actions and escalation rules |
Prompt design should follow domain risk.
If the answer can affect money, security, compliance, or production systems, use stricter grounding and clearer citations.
Common Prompt Mistakes
Prompt mistakes often make RAG answers look better but less reliable.
| Mistake | Result |
|---|---|
| No grounding rule | Model may answer from memory |
| Anonymous chunks | Model cannot tell source or scope |
| No unknown handling | Model may hallucinate missing details |
| Too much metadata | Context becomes noisy |
| No citation rule | Answer becomes hard to trace |
| Overly broad role | Model behavior becomes inconsistent |
| No conflict rule | Model may merge incompatible sources |
| No output format | Application integration becomes unstable |
A prompt should not compensate for bad retrieval by becoming vague.
If the context is weak, the prompt should expose that weakness clearly.
How to Evaluate Prompt Design
Prompt design should be evaluated separately from retrieval.
Use a fixed set of retrieved chunks, then compare different prompts.
| Evaluation Question | What It Tests |
|---|---|
| Does the answer use only retrieved evidence? | Grounding |
| Does it cite the correct source? | Traceability |
| Does it refuse when context is insufficient? | Unknown handling |
| Does it avoid wrong version or product scope? | Metadata awareness |
| Does it handle conflicting chunks? | Conflict behavior |
| Is the output parseable? | Format stability |
Do not evaluate prompt quality only by fluency.
A fluent unsupported answer is still a bad RAG answer.
Reusable Example: Bad Prompt vs Better Prompt
Bad prompt:
Answer the question based on the context.
This is too vague. It does not define what to do with missing evidence, conflicts, metadata, or citation.
Better prompt:
You are a support knowledge assistant.
Use only the provided context to answer.
If the context is insufficient, say that the provided context does not contain enough information.
Do not invent policy details.
Prefer chunks that directly answer the user question.
Cite the title, section, and version used for the answer.
The better prompt is not much longer, but it controls the important failure modes.
The Main Principle
Prompt design is the final control layer of a RAG system.
Indexing decides what knowledge exists. Retrieval decides what evidence is found. Reranking decides what evidence is prioritized. Prompt design decides how the LLM uses that evidence.
The practical rule is simple: a RAG prompt should make unsupported answering harder than saying "the context is insufficient."
在 indexing、retrieval 和 reranking 之后,LLM 会收到被选中的 context,并生成最终答案。这个阶段不只是选择更强的模型。Prompt design 会控制模型怎么读取证据、怎么处理缺失信息、怎么引用来源,以及怎么避免超出 retrieved context 回答。
简短答案
一个 RAG prompt 应该告诉模型五件事:
1. Role
它应该扮演什么类型的 assistant,例如 support assistant、technical assistant 或 internal knowledge assistant。
2. Evidence
当前有哪些 retrieved chunks、它们来自哪里,以及哪些 metadata 或 source fields 重要。
3. Grounding Rule
模型是否必须只根据 context 回答,还是可以加入有限的一般推理。
4. Unknown Handling
当 retrieved evidence 缺失、太弱、过期或冲突时,模型应该怎么处理。
5. Output Format
最终答案应该长什么样,才能对用户有用,也方便 application 稳定处理。
Prompt 不应该尝试掩盖 retrieval 问题。如果证据不足,模型应该说明 context 不足,而不是编造答案。
为什么 RAG 需要 Prompt Design
普通聊天里,模型可以根据内部知识回答。
但在 RAG 里,模型应该根据 retrieved evidence 回答。
所以 prompt 必须定义这些边界:
| 来源 | 模型应该怎么处理 |
|---|---|
| Retrieved context | 主要证据 |
| User question | 意图和约束 |
| Metadata | scope、version、product、permission、source |
| Model knowledge | 只有在允许时作为辅助推理 |
| Missing evidence | 应该触发不确定,而不是 hallucination |
如果没有 prompt rules,模型可能会松散地使用 retrieved context。它可能从记忆回答、混合不相关 chunks、忽略 version metadata,或者在证据不足时仍然表现得很自信。
Prompt design 是 generation behavior 的控制层。
Prompt 的核心组件
一个实用的 RAG prompt 通常有几个部分。
System role
→ Answer rules
→ Citation rules
→ Context block
→ User question
→ Output format
每个部分都有自己的任务。
| Prompt Part | 目的 |
|---|---|
| System role | 设置行为和领域 |
| Answer rules | 定义模型可以做什么、不可以做什么 |
| Context block | 提供 retrieved evidence |
| Source metadata | 提供 scope 和 traceability |
| Unknown handling | 防止 unsupported answers |
| Output format | 让 response 更稳定 |
| Citation rules | 把 answer claims 连接回 evidence |
好的 prompt 不一定很长。它需要明确控制真正重要的决策。
Role Design
Role 会告诉模型现在要做什么类型的工作。
弱 role:
You are a helpful assistant.
更好的 RAG role:
You are a support knowledge assistant. Answer user questions using only the provided retrieved context. Prefer direct policy evidence over general explanation. If the context is insufficient, say what is missing.
更好的 role 定义了:
- domain
- source of truth
- answer behavior
- uncertainty behavior
role 应该足够具体,可以约束模型,但不要复杂到难维护。
Context Formatting
Context formatting 是 RAG prompt design 里最重要的部分之一。
不要把 chunks 当成匿名段落丢给模型。应该加入能帮助模型理解 scope 的 structure 和 metadata。
弱 context:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.
更好的 context:
[Chunk ID: doc_refund_policy_learnpro_2026_04__subscription_cancellation]
Source: Refund and Cancellation Policy
Product: LearnPro Online Course Platform
Version: 2026.04
Section: Subscription Cancellation
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.
这样模型会知道这条规则属于哪个产品、哪个版本、哪个 section。
LLM 不需要所有 metadata。它只需要那些会影响答案正确性的字段。
常见有用字段包括:
- title
- section
- product
- version
- language
- source URI or source ID
- updated date
- permission-safe citation label
Grounding Rules
Grounding rules 定义模型必须多严格地跟随 retrieved context。
常见有三种层级。
| Grounding Level | 行为 | Use Case |
|---|---|---|
| Strict grounding | 只根据 retrieved context 回答 | Policy、legal、compliance、support |
| Context-first grounding | 优先使用 context,允许有限解释 | Technical education、internal docs |
| Open grounding | 使用 retrieved context 加一般模型知识 | Broad research 或 brainstorming |
多数 RAG 系统更适合 strict 或 context-first grounding。
Strict grounding 示例:
Answer only using the provided context. Do not use outside knowledge. If the context does not contain the answer, say that the provided context is insufficient.
Context-first grounding 示例:
Use the provided context as the primary source. You may add brief general explanation only when it does not change the policy, facts, or source-specific answer.
领域风险越高,grounding 就应该越严格。
Unknown Handling
Unknown handling 会告诉模型当 context 不足时该怎么办。
如果没有这条规则,模型可能会自己补洞。
好的 unknown rule:
If the context does not contain enough evidence to answer, say:
"The provided context does not contain enough information to answer this."
Then explain which missing information would be needed.
这很重要,因为 RAG failure 常常发生在 LLM 之前。
| Failure | LLM 应该怎么做 |
|---|---|
| Correct chunk 没有被 retrieved | 说明 context insufficient |
| Retrieved chunks 互相冲突 | 说明冲突,避免给出单一自信答案 |
| Version 不清楚 | 说明 applicable version 不明确 |
| Product scope 缺失 | 说明缺少 product scope |
| Permission-limited context | 只根据允许的 context 回答 |
Unknown handling 不是弱点。它是系统避免 unsupported answers 的方式。
Citation Design
Citation design 决定最终答案如何引用 evidence。
在 RAG 里,citation 不是装饰。它是 traceability 的一部分。
有用的 citation source 应该稳定。
| Citation Field | 目的 |
|---|---|
| chunk_id | debugging 和 internal trace |
| document_id | 稳定 document reference |
| title | 人类可读的 source |
| section | 说明答案来自哪里 |
| source_uri | link 或 source location |
| version | 显示适用文档版本 |
prompt 应该告诉模型如何引用。
示例:
When making a factual claim from the context, cite the source using this format:
(Source: <title>, Section: <section>, Version: <version>)
对 user-facing system 来说,不一定要暴露 internal chunk IDs。但系统内部仍然应该记录它们。
好的答案可以同时 user-friendly 和 traceable。
Output Format Design
Output format 控制 response stability。
如果是 human chat interface,输出可以自然一点,但仍然要有结构。
示例:
Answer:
You can cancel your monthly subscription at any time, but the cancellation only stops the next billing cycle. The current active month is not refunded.
Source:
Refund and Cancellation Policy, Subscription Cancellation, Version 2026.04
如果是 application 使用,JSON 可能更合适。
{
"answer": "You can cancel your monthly subscription at any time, but the current active month is not refunded.",
"confidence": "supported",
"sources": [
{
"title": "Refund and Cancellation Policy",
"section": "Subscription Cancellation",
"version": "2026.04"
}
],
"missing_information": []
}
当 application 需要 parse answer 时,使用 structured output。
当答案只是展示给人看时,使用 natural output。
一个实用 RAG Prompt Template
这是一个可以复用到 refund policy example 的 prompt template。
You are a support knowledge assistant.
Answer the user's question using only the provided context.
Rules:
- Use the context as the source of truth.
- Do not invent policy details.
- If the context is insufficient, say the provided context does not contain enough information.
- If sources conflict, explain the conflict instead of choosing silently.
- Prefer specific policy sections over general related sections.
- Keep the answer direct and practical.
- Include source information after the answer.
Context:
[Chunk ID: doc_refund_policy_learnpro_2026_04__subscription_cancellation]
Title: Refund and Cancellation Policy
Product: LearnPro Online Course Platform
Version: 2026.04
Section: Subscription Cancellation
Text:
Monthly subscriptions can be cancelled at any time. The cancellation will stop the next billing cycle, but it does not refund the current active month.
User question:
Can I cancel my monthly subscription and get a refund for this month?
Output format:
Answer:
<direct answer>
Source:
<title>, <section>, <version>
预期答案:
Answer:
You can cancel your monthly subscription at any time, but it will only stop the next billing cycle. The current active month is not refunded.
Source:
Refund and Cancellation Policy, Subscription Cancellation, Version 2026.04
这个 prompt 很简单,但它控制了重要行为:source of truth、uncertainty、conflict handling 和 output format。
根据 Use Case 设计 Prompt
不同 RAG 系统需要不同的 prompt behavior。
| Use Case | Prompt Style | 原因 |
|---|---|---|
| Customer support | Strict, concise, source-backed | 避免 unsupported policy claims |
| Internal knowledge base | Context-first, explanatory | 用户可能需要 reasoning 和 process |
| Legal or compliance | Very strict, citation-heavy | exact wording 和 source version 很重要 |
| Developer documentation | Technical, example-oriented | 用户需要 code、constraints 和 edge cases |
| Education assistant | Context-first, teaching style | explanation 重要,但 source 应该主导 |
| Operations runbook | Direct, step-based | 用户需要 safe actions 和 escalation rules |
Prompt design 应该跟 domain risk 走。
如果答案会影响金钱、安全、合规或生产系统,就应该使用更严格的 grounding 和更清楚的 citations。
常见 Prompt 错误
Prompt 错误通常会让 RAG 答案看起来更好,但可靠性更低。
| Mistake | Result |
|---|---|
| No grounding rule | 模型可能从记忆回答 |
| Anonymous chunks | 模型无法判断 source 或 scope |
| No unknown handling | 模型可能 hallucinate missing details |
| Too much metadata | context 变得很吵 |
| No citation rule | answer 很难 trace |
| Overly broad role | model behavior 不稳定 |
| No conflict rule | 模型可能合并不兼容 sources |
| No output format | application integration 不稳定 |
Prompt 不应该通过变模糊来补偿差的 retrieval。
如果 context 很弱,prompt 应该清楚暴露这个问题。
如何评估 Prompt Design
Prompt design 应该和 retrieval 分开评估。
固定一组 retrieved chunks,然后比较不同 prompts。
| Evaluation Question | 测什么 |
|---|---|
| Answer 是否只使用 retrieved evidence? | Grounding |
| 是否引用正确 source? | Traceability |
| context 不足时是否拒答? | Unknown handling |
| 是否避免错误 version 或 product scope? | Metadata awareness |
| 是否处理 conflicting chunks? | Conflict behavior |
| output 是否可 parse? | Format stability |
不要只用流畅度评估 prompt quality。
流畅但没有证据支持的答案,仍然是差的 RAG answer。
可复用例子:Bad Prompt vs Better Prompt
Bad prompt:
Answer the question based on the context.
这太模糊。它没有定义 missing evidence、conflicts、metadata 和 citation 应该怎么处理。
Better prompt:
You are a support knowledge assistant.
Use only the provided context to answer.
If the context is insufficient, say that the provided context does not contain enough information.
Do not invent policy details.
Prefer chunks that directly answer the user question.
Cite the title, section, and version used for the answer.
更好的 prompt 没有长很多,但它控制了重要 failure modes。
核心原则
Prompt design 是 RAG 系统最后一层控制。
Indexing 决定系统拥有什么知识。Retrieval 决定系统找到什么证据。Reranking 决定哪些证据优先。Prompt design 决定 LLM 怎么使用这些证据。
实用规则很简单:一个 RAG prompt 应该让 unsupported answering 比说 “context insufficient” 更难发生。