4.6 Knowledge Base Q&A Logic (RAG)
Pop’s knowledge‑base answering capability is built on RAG (Retrieval‑Augmented Generation).
With RAG, when answering a user question, the AI does not rely solely on its internal knowledge but also uses your uploaded documents, enabling:
- More accurate and controllable answers
- Fully traceable content
- No more “hallucinating” unsupported statements
This chapter explains in detail how Pop retrieves relevant segments from your knowledge base and uses them in AI responses.
🧠 1. What Is RAG?
In simple terms:
RAG = Retrieval + LLM Generation
Flow:
User Query → Embedding → Retrieve Document Chunks → Select Relevant Snippets → LLM → Answer
RAG provides:
- No need for the LLM to memorize all content
- Your knowledge base can be updated in real time
- All answers come from verifiable documentation
🔎 2. Pop’s Three‑Stage RAG Pipeline
Pop performs RAG in three major stages:
Stage 1: Retrieval
Stage 2: Ranking
Stage 3: Generation
📍 Stage 1: Retrieval
Pop retrieves content using two methods:
- Vector Search (KNN)
- BM25 Text Search
Both methods return their Top‑N (usually 10–30) candidate chunks.
Illustration:
Query → Embedding → Vector DB → Top K
Query → Tokenization → BM25 → Top K
Using both methods increases recall coverage and avoids missing important information.
📚 Stage 2: Re‑ranking
After retrieval, Pop performs a sequence of ranking steps to ensure that only the best content is passed to the LLM.
Ranking Steps:
- Deduplication (remove identical or similar chunks)
- Similarity weighting
- Fusion of BM25 + Embedding ranking signals
- Document‑coherence ordering (chunks from the same document appear consecutively)
- Chunk length control (handle overly short or long segments)
Finally, 3–6 top‑ranked chunks are selected as “evidence blocks”.
✨ Stage 3: Generation
Selected chunks are inserted into a structured prompt, including:
- User question
- Original document excerpts
- Citation indicators
- Answer constraints
Example prompt:
You are a professional assistant. Answer the user's question based on the documents below.
If the documents do not contain the answer, clearly respond: “No relevant information found in the documents.”
【Chunk 1】
...
【Chunk 2】
...
【Chunk 3】
...
Question: {User_Query}
The model strictly references the provided content and avoids fabrication.
📌 3. Pop’s RAG Features
1. Traceability
Every answer can be traced to:
- Document source
- Specific paragraph
- Chunk ID
2. Multi‑chunk merging
Pop automatically:
- Merges semantically related chunks
- Removes redundancy
- Restores missing context
3. Enhanced processing of unstructured text
For PDFs, webpages, or complex structures, Pop preserves:
- Heading hierarchy
- Page numbers
- Semantic context
4. Anti‑hallucination rules
Pop’s RAG template includes strict constraints:
- “Do not answer if information is missing”
- “Respond only based on the provided documents”
📊 4. Example of RAG in Action
User question:
What is Pop’s knowledge‑base storage structure?
Retrieved chunks:
- Chunk 17: Overview of knowledge‑base structure
- Chunk 18: Explanation of BM25 and vector indexing
LLM answer:
Pop’s knowledge‑base structure consists of documents, document chunks, and dual‑indexing (BM25 + vector)…
(omitted)
All references are traceable.
💡 5. When Is RAG Useful?
RAG is ideal for:
- Product documentation Q&A
- API reference lookup
- Customer‑service FAQ bots
- Enterprise internal knowledge bases
- Policy, regulation, or legal text interpretation
- Technical documentation and engineering manuals
Essentially, anything that requires answering questions based on real documents.
⚠️ 6. Limitations of RAG
Despite its power, RAG has limitations:
- Poor‑quality documents → poor chunking
- Image‑dominant files → need OCR
- Overly abstract questions → no direct answer
- Weak LLM models → limited reasoning and synthesis
Hence:
High‑quality source documents = high‑quality RAG answers.
✅ Summary
RAG is Pop’s core capability for knowledge‑base Q&A:
| Stage | Responsibility |
|---|---|
| Retrieval | Find potentially relevant content |
| Ranking | Select the most relevant, useful chunks |
| Generation | Produce grounded answers using retrieved chunks |
RAG enables Pop to deliver answers that are:
- Precise
- Reliable
- Traceable
- Business‑aligned