knowledge-base/Add Content From Documents

4.3 Add Content: Documents / URLs / Manual Input

After creating a knowledge base, the next step is to add content to it. Pop supports a rich variety of content sources, including local documents, web URLs, pasted text, and—coming in the future—API‑driven synchronization.

This section introduces all supported content types and explains how Pop parses and processes them.


🗂️ 1. Supported Document Formats (Same as Document Center)

Pop Knowledge Base supports the full set of formats available in the Document Center, including but not limited to:

📄 Text Formats

  • .txt
  • .md (Markdown)
  • .rst
  • .log
  • .json
  • .xml
  • .yaml / .yml
  • .csv / .tsv
  • .ini / .conf
  • .env

📝 Office Documents

  • .pdf
  • .doc / .docx
  • .ppt / .pptx
  • .xls / .xlsx
  • .odt / .ods / .odp

📚 Ebook Formats

  • .epub
  • .mobi
  • .azw / .azw3
  • .fb2 / .fbz
  • .cbz
  • .djvu

🧑‍💻 Code Files

(Pop does not execute code; all code files are treated as indexable text.)

  • .py
  • .js / .mjs
  • .ts / .tsx
  • .jsx
  • .java
  • .cpp / .c
  • .go
  • .php
  • .rb
  • .cs
  • .swift
  • .scala
  • .kt
  • .rs
  • .lua
  • .sh / .bat / .ps1
  • .sql
  • .tf
  • .dockerfile
  • .hbs / .ejs / .jinja / .mustache

🎨 Graphic / Structured Files

(Pop automatically performs OCR or extracts available text.)

  • .svg
  • .vsd

⚠️ Note: Binary images (png/jpg) are not added directly to the knowledge base.
However, AI chat can still understand them using multimodal capabilities.
To make image content searchable, convert it to PDF or Markdown before import.


📥 2. Three Ways to Add Documents

Pop provides three flexible import methods for knowledge base content.


Method 1: Upload Documents (Most Common)

In the knowledge base interface, click:

“Upload Document” → Select local files

Pop will automatically perform:

  • Document parsing (PDF / Word / PPT / Markdown, etc.)
  • Cleaning irrelevant content (headers/footers/styles)
  • Automatic title & summary extraction
  • Document chunking
  • Vector embedding generation
  • BM25 index creation

You may upload:

  • Single file
  • Multiple files (batch upload)
  • Entire folders (when batch import is enabled)

After upload, documents display:

  • Filename
  • Page count
  • Number of chunks
  • Indexing status
  • Error messages (e.g., OCR failure)

Method 2: Fetch Content via URL

Click “Add URL”, then enter a webpage link.

Pop will automatically:

  • Request the webpage
  • Extract main article content (remove ads/navbars)
  • Clean noise
  • Detect the title
  • Generate summary
  • Chunk and embed the text

Suitable for:

  • Documentation websites
  • Technical blogs
  • Online tutorials
  • Internal company knowledge base sites
  • News articles

Supports standard HTML, mobile pages, and some SSR‑rendered pages (SSR improves accuracy).


Method 3: Manual Text Input

Click “Add Text Manually” to type or paste any content:

  • Meeting notes
  • Project highlights
  • Product feature descriptions
  • Hand‑written summaries
  • AI‑generated content
  • Small FAQ entries

Pop will format text automatically, chunk it, and index it.


🛠 How Pop Processes Added Content

Regardless of input method, Pop performs the following pipeline:

1. Document Parsing

  • PDF → text extraction + OCR
  • Word / PPT → text extraction
  • Markdown → structure parsing
  • URL → article extraction
  • Code files → plain text conversion

2. Content Cleaning

  • Remove headers/footers
  • Remove duplicate lines
  • Remove noise (section numbers, page breaks)
  • Merge broken paragraphs

3. Smart Chunking

Pop uses intelligent strategies:

  • Split by heading structure
  • Combine natural paragraphs
  • Maintain optimal chunk size: 200–500 chars
  • Prevent overly long chunks that degrade embedding quality

4. Vector Embedding Generation

Uses the model you selected for this knowledge base.

5. Index Construction

Includes:

  • Vector KNN index
  • BM25 keyword index
  • Hybrid retrieval weighting

Enables high‑quality RAG responses.


📊 Managing Documents After Import

After uploading, you can:

  • View structured summary
  • Preview document content
  • Re‑index individual documents
  • Delete documents
  • View all chunks
  • Check embedding status of each chunk
  • Export chunks (for RAG debugging)

If errors occur, Pop will show details in the Task List.


📌 Summary

Pop offers three flexible content‑adding methods:

Method Best For Automatic Processing Steps
Upload Local documents Parsing → Chunking → Embedding → Indexing
URL Import Web page articles Extraction → Chunking → Embedding
Manual Text Small notes / FAQ Paragraphing → Embedding → Indexing

No matter where content comes from, Pop converts it into structured, searchable knowledge for high‑quality RAG question‑answering.