4.7 Knowledge Base Management & Index Maintenance

To keep Pop’s knowledge base stable, efficient, and reliable, it is important to understand how to manage documents, rebuild indexes, and monitor indexing tasks.
This chapter introduces all management capabilities, including document inspection, re‑indexing, deletion, error handling, and performance maintenance.

📁 1. Document Management

Inside a knowledge base, you will see a list of documents.
Each document represents a source item that has undergone parsing, chunking, and indexing.

Fields in the document list:

Field	Description
Document Name	Original filename or user‑defined name
Type	PDF, Word, Markdown, URL, Plain Text, etc.
Chunk Count	Number of chunks generated after splitting
Vector Status	Whether the embedding process is complete
Created At	When the document was added
Actions	Preview / Reindex / Delete

Clicking a document allows you to view:

Parsed text content
Structured headings
Chunk list
Document preview (e.g., PDF page preview)

🔄 2. Reindexing a Document

You should reindex a document in situations such as:

The document content has changed
The vector model has been updated (e.g., you switched embedding models)
Some content failed to parse
OCR quality was poor and needs reprocessing

Click "Reindex" to regenerate:

Parsing
Cleaning
Chunking
Vector embeddings
BM25 index

Pop handles all tasks automatically—no manual intervention required.

🧩 3. Deleting a Document

Deleting a document will synchronously remove:

Document content
All chunks
Vector index
BM25 index

A confirmation dialog appears to prevent accidental deletion.

⚠️ Deleting a document permanently removes its knowledge base contributions and cannot be undone.

📚 4. Viewing Chunks

Click "Chunks" to inspect detailed chunk information, including:

Text content
Token count
Chunk order
Heading hierarchy
Vector generation status

This is useful for debugging knowledge base responses, such as:

Chunks too short / too long
Missing sections
Parsing issues

🛠 5. Index Maintenance

Pop uses a dual‑index architecture:

Vector Index (Embedding KNN)
BM25 Inverted Text Index

These indexes are automatically updated during the following actions:

Automatically triggered:

Adding new documents
Deleting documents
Reindexing documents
Cleanup operations
Vector model reconfiguration (future versions)

Manual maintenance options:

Accessible from the top‑right menu:

Rebuild the entire knowledge base index
Clean invalid chunks
Refresh index statistics

Recommended after large‑scale updates or imports.

🚨 6. Task Status & Error Diagnostics

In the “Task List,” you can observe the status of each job:

Status	Meaning
Pending	Waiting to be processed
Running	Parsing or embedding generation in progress
Success	Completed successfully
Failed	Errors occurred and require attention

Common issues:

OCR failures (scanned PDFs)
Unable to parse document content
Character encoding problems
URL fetch failures
Embedding model unavailable

Each failed task contains detailed logs for troubleshooting.

🧹 7. Knowledge Base Cleanup

Pop provides several cleanup utilities:

1. Remove Orphaned Chunks

Chunks left behind after document deletion can be automatically cleaned.

2. Clear Failed Tasks

Keeps the task list tidy.

3. Rebuild Index

Rebuilds the entire knowledge base index; useful when:

Embedding model upgraded
Many documents updated
Retrieval quality drops

📈 8. Index Performance Optimization Tips

For best retrieval quality:

Keep document structure clear (headings, paragraphs)
Avoid uploading large numbers of tiny fragmented notes
Prefer non‑scanned PDFs when possible
Keep each knowledge base within a reasonable size (5,000–50,000 chunks recommended)
Periodically rebuild indexes after major updates