knowledge-base/Knowledge Base Management & Index Maintenance

4.7 Knowledge Base Management & Index Maintenance

To keep Pop’s knowledge base stable, efficient, and reliable, it is important to understand how to manage documents, rebuild indexes, and monitor indexing tasks.
This chapter introduces all management capabilities, including document inspection, re‑indexing, deletion, error handling, and performance maintenance.


📁 1. Document Management

Inside a knowledge base, you will see a list of documents.
Each document represents a source item that has undergone parsing, chunking, and indexing.

Fields in the document list:

Field Description
Document Name Original filename or user‑defined name
Type PDF, Word, Markdown, URL, Plain Text, etc.
Chunk Count Number of chunks generated after splitting
Vector Status Whether the embedding process is complete
Created At When the document was added
Actions Preview / Reindex / Delete

Clicking a document allows you to view:

  • Parsed text content
  • Structured headings
  • Chunk list
  • Document preview (e.g., PDF page preview)

🔄 2. Reindexing a Document

You should reindex a document in situations such as:

  • The document content has changed
  • The vector model has been updated (e.g., you switched embedding models)
  • Some content failed to parse
  • OCR quality was poor and needs reprocessing

Click "Reindex" to regenerate:

  • Parsing
  • Cleaning
  • Chunking
  • Vector embeddings
  • BM25 index

Pop handles all tasks automatically—no manual intervention required.


🧩 3. Deleting a Document

Deleting a document will synchronously remove:

  • Document content
  • All chunks
  • Vector index
  • BM25 index

A confirmation dialog appears to prevent accidental deletion.

⚠️ Deleting a document permanently removes its knowledge base contributions and cannot be undone.


📚 4. Viewing Chunks

Click "Chunks" to inspect detailed chunk information, including:

  • Text content
  • Token count
  • Chunk order
  • Heading hierarchy
  • Vector generation status

This is useful for debugging knowledge base responses, such as:

  • Chunks too short / too long
  • Missing sections
  • Parsing issues

🛠 5. Index Maintenance

Pop uses a dual‑index architecture:

  1. Vector Index (Embedding KNN)
  2. BM25 Inverted Text Index

These indexes are automatically updated during the following actions:

Automatically triggered:

  • Adding new documents
  • Deleting documents
  • Reindexing documents
  • Cleanup operations
  • Vector model reconfiguration (future versions)

Manual maintenance options:

Accessible from the top‑right menu:

  • Rebuild the entire knowledge base index
  • Clean invalid chunks
  • Refresh index statistics

Recommended after large‑scale updates or imports.


🚨 6. Task Status & Error Diagnostics

In the “Task List,” you can observe the status of each job:

Status Meaning
Pending Waiting to be processed
Running Parsing or embedding generation in progress
Success Completed successfully
Failed Errors occurred and require attention

Common issues:

  • OCR failures (scanned PDFs)
  • Unable to parse document content
  • Character encoding problems
  • URL fetch failures
  • Embedding model unavailable

Each failed task contains detailed logs for troubleshooting.


🧹 7. Knowledge Base Cleanup

Pop provides several cleanup utilities:

1. Remove Orphaned Chunks

Chunks left behind after document deletion can be automatically cleaned.

2. Clear Failed Tasks

Keeps the task list tidy.

3. Rebuild Index

Rebuilds the entire knowledge base index; useful when:

  • Embedding model upgraded
  • Many documents updated
  • Retrieval quality drops

📈 8. Index Performance Optimization Tips

For best retrieval quality:

  • Keep document structure clear (headings, paragraphs)
  • Avoid uploading large numbers of tiny fragmented notes
  • Prefer non‑scanned PDFs when possible
  • Keep each knowledge base within a reasonable size (5,000–50,000 chunks recommended)
  • Periodically rebuild indexes after major updates