4.7 Knowledge Base Management & Index Maintenance
To keep Pop’s knowledge base stable, efficient, and reliable, it is important to understand how to manage documents, rebuild indexes, and monitor indexing tasks.
This chapter introduces all management capabilities, including document inspection, re‑indexing, deletion, error handling, and performance maintenance.
📁 1. Document Management
Inside a knowledge base, you will see a list of documents.
Each document represents a source item that has undergone parsing, chunking, and indexing.
Fields in the document list:
| Field | Description |
|---|---|
| Document Name | Original filename or user‑defined name |
| Type | PDF, Word, Markdown, URL, Plain Text, etc. |
| Chunk Count | Number of chunks generated after splitting |
| Vector Status | Whether the embedding process is complete |
| Created At | When the document was added |
| Actions | Preview / Reindex / Delete |
Clicking a document allows you to view:
- Parsed text content
- Structured headings
- Chunk list
- Document preview (e.g., PDF page preview)
🔄 2. Reindexing a Document
You should reindex a document in situations such as:
- The document content has changed
- The vector model has been updated (e.g., you switched embedding models)
- Some content failed to parse
- OCR quality was poor and needs reprocessing
Click "Reindex" to regenerate:
- Parsing
- Cleaning
- Chunking
- Vector embeddings
- BM25 index
Pop handles all tasks automatically—no manual intervention required.
🧩 3. Deleting a Document
Deleting a document will synchronously remove:
- Document content
- All chunks
- Vector index
- BM25 index
A confirmation dialog appears to prevent accidental deletion.
⚠️ Deleting a document permanently removes its knowledge base contributions and cannot be undone.
📚 4. Viewing Chunks
Click "Chunks" to inspect detailed chunk information, including:
- Text content
- Token count
- Chunk order
- Heading hierarchy
- Vector generation status
This is useful for debugging knowledge base responses, such as:
- Chunks too short / too long
- Missing sections
- Parsing issues
🛠 5. Index Maintenance
Pop uses a dual‑index architecture:
- Vector Index (Embedding KNN)
- BM25 Inverted Text Index
These indexes are automatically updated during the following actions:
Automatically triggered:
- Adding new documents
- Deleting documents
- Reindexing documents
- Cleanup operations
- Vector model reconfiguration (future versions)
Manual maintenance options:
Accessible from the top‑right menu:
- Rebuild the entire knowledge base index
- Clean invalid chunks
- Refresh index statistics
Recommended after large‑scale updates or imports.
🚨 6. Task Status & Error Diagnostics
In the “Task List,” you can observe the status of each job:
| Status | Meaning |
|---|---|
| Pending | Waiting to be processed |
| Running | Parsing or embedding generation in progress |
| Success | Completed successfully |
| Failed | Errors occurred and require attention |
Common issues:
- OCR failures (scanned PDFs)
- Unable to parse document content
- Character encoding problems
- URL fetch failures
- Embedding model unavailable
Each failed task contains detailed logs for troubleshooting.
🧹 7. Knowledge Base Cleanup
Pop provides several cleanup utilities:
1. Remove Orphaned Chunks
Chunks left behind after document deletion can be automatically cleaned.
2. Clear Failed Tasks
Keeps the task list tidy.
3. Rebuild Index
Rebuilds the entire knowledge base index; useful when:
- Embedding model upgraded
- Many documents updated
- Retrieval quality drops
📈 8. Index Performance Optimization Tips
For best retrieval quality:
- Keep document structure clear (headings, paragraphs)
- Avoid uploading large numbers of tiny fragmented notes
- Prefer non‑scanned PDFs when possible
- Keep each knowledge base within a reasonable size (5,000–50,000 chunks recommended)
- Periodically rebuild indexes after major updates