4.3 Add Content: Documents / URLs / Manual Input
After creating a knowledge base, the next step is to add content to it. Pop supports a rich variety of content sources, including local documents, web URLs, pasted text, and—coming in the future—API‑driven synchronization.
This section introduces all supported content types and explains how Pop parses and processes them.
🗂️ 1. Supported Document Formats (Same as Document Center)
Pop Knowledge Base supports the full set of formats available in the Document Center, including but not limited to:
📄 Text Formats
.txt.md(Markdown).rst.log.json.xml.yaml/.yml.csv/.tsv.ini/.conf.env
📝 Office Documents
.pdf.doc/.docx.ppt/.pptx.xls/.xlsx.odt/.ods/.odp
📚 Ebook Formats
.epub.mobi.azw/.azw3.fb2/.fbz.cbz.djvu
🧑💻 Code Files
(Pop does not execute code; all code files are treated as indexable text.)
.py.js/.mjs.ts/.tsx.jsx.java.cpp/.c.go.php.rb.cs.swift.scala.kt.rs.lua.sh/.bat/.ps1.sql.tf.dockerfile.hbs/.ejs/.jinja/.mustache
🎨 Graphic / Structured Files
(Pop automatically performs OCR or extracts available text.)
.svg.vsd
⚠️ Note: Binary images (png/jpg) are not added directly to the knowledge base.
However, AI chat can still understand them using multimodal capabilities.
To make image content searchable, convert it to PDF or Markdown before import.
📥 2. Three Ways to Add Documents
Pop provides three flexible import methods for knowledge base content.
Method 1: Upload Documents (Most Common)
In the knowledge base interface, click:
“Upload Document” → Select local files
Pop will automatically perform:
- Document parsing (PDF / Word / PPT / Markdown, etc.)
- Cleaning irrelevant content (headers/footers/styles)
- Automatic title & summary extraction
- Document chunking
- Vector embedding generation
- BM25 index creation
You may upload:
- Single file
- Multiple files (batch upload)
- Entire folders (when batch import is enabled)
After upload, documents display:
- Filename
- Page count
- Number of chunks
- Indexing status
- Error messages (e.g., OCR failure)
Method 2: Fetch Content via URL
Click “Add URL”, then enter a webpage link.
Pop will automatically:
- Request the webpage
- Extract main article content (remove ads/navbars)
- Clean noise
- Detect the title
- Generate summary
- Chunk and embed the text
Suitable for:
- Documentation websites
- Technical blogs
- Online tutorials
- Internal company knowledge base sites
- News articles
Supports standard HTML, mobile pages, and some SSR‑rendered pages (SSR improves accuracy).
Method 3: Manual Text Input
Click “Add Text Manually” to type or paste any content:
- Meeting notes
- Project highlights
- Product feature descriptions
- Hand‑written summaries
- AI‑generated content
- Small FAQ entries
Pop will format text automatically, chunk it, and index it.
🛠 How Pop Processes Added Content
Regardless of input method, Pop performs the following pipeline:
1. Document Parsing
- PDF → text extraction + OCR
- Word / PPT → text extraction
- Markdown → structure parsing
- URL → article extraction
- Code files → plain text conversion
2. Content Cleaning
- Remove headers/footers
- Remove duplicate lines
- Remove noise (section numbers, page breaks)
- Merge broken paragraphs
3. Smart Chunking
Pop uses intelligent strategies:
- Split by heading structure
- Combine natural paragraphs
- Maintain optimal chunk size: 200–500 chars
- Prevent overly long chunks that degrade embedding quality
4. Vector Embedding Generation
Uses the model you selected for this knowledge base.
5. Index Construction
Includes:
- Vector KNN index
- BM25 keyword index
- Hybrid retrieval weighting
Enables high‑quality RAG responses.
📊 Managing Documents After Import
After uploading, you can:
- View structured summary
- Preview document content
- Re‑index individual documents
- Delete documents
- View all chunks
- Check embedding status of each chunk
- Export chunks (for RAG debugging)
If errors occur, Pop will show details in the Task List.
📌 Summary
Pop offers three flexible content‑adding methods:
| Method | Best For | Automatic Processing Steps |
|---|---|---|
| Upload | Local documents | Parsing → Chunking → Embedding → Indexing |
| URL Import | Web page articles | Extraction → Chunking → Embedding |
| Manual Text | Small notes / FAQ | Paragraphing → Embedding → Indexing |
No matter where content comes from, Pop converts it into structured, searchable knowledge for high‑quality RAG question‑answering.