4.3 Add Content: Documents / URLs / Manual Input

After creating a knowledge base, the next step is to add content to it. Pop supports a rich variety of content sources, including local documents, web URLs, pasted text, and—coming in the future—API‑driven synchronization.

This section introduces all supported content types and explains how Pop parses and processes them.

🗂️ 1. Supported Document Formats (Same as Document Center)

Pop Knowledge Base supports the full set of formats available in the Document Center, including but not limited to:

📄 Text Formats

.txt
.md (Markdown)
.rst
.log
.json
.xml
.yaml / .yml
.csv / .tsv
.ini / .conf
.env

📝 Office Documents

.pdf
.doc / .docx
.ppt / .pptx
.xls / .xlsx
.odt / .ods / .odp

📚 Ebook Formats

.epub
.mobi
.azw / .azw3
.fb2 / .fbz
.cbz
.djvu

🧑‍💻 Code Files

(Pop does not execute code; all code files are treated as indexable text.)

.py
.js / .mjs
.ts / .tsx
.jsx
.java
.cpp / .c
.go
.php
.rb
.cs
.swift
.scala
.kt
.rs
.lua
.sh / .bat / .ps1
.sql
.tf
.dockerfile
.hbs / .ejs / .jinja / .mustache

🎨 Graphic / Structured Files

(Pop automatically performs OCR or extracts available text.)

.svg
.vsd

⚠️ Note: Binary images (png/jpg) are not added directly to the knowledge base.
However, AI chat can still understand them using multimodal capabilities.
To make image content searchable, convert it to PDF or Markdown before import.

📥 2. Three Ways to Add Documents

Pop provides three flexible import methods for knowledge base content.

Method 1: Upload Documents (Most Common)

In the knowledge base interface, click:

“Upload Document” → Select local files

Pop will automatically perform:

Document parsing (PDF / Word / PPT / Markdown, etc.)
Cleaning irrelevant content (headers/footers/styles)
Automatic title & summary extraction
Document chunking
Vector embedding generation
BM25 index creation

You may upload:

Single file
Multiple files (batch upload)
Entire folders (when batch import is enabled)

After upload, documents display:

Filename
Page count
Number of chunks
Indexing status
Error messages (e.g., OCR failure)

Method 2: Fetch Content via URL

Click “Add URL”, then enter a webpage link.

Pop will automatically:

Request the webpage
Extract main article content (remove ads/navbars)
Clean noise
Detect the title
Generate summary
Chunk and embed the text

Suitable for:

Documentation websites
Technical blogs
Online tutorials
Internal company knowledge base sites
News articles

Supports standard HTML, mobile pages, and some SSR‑rendered pages (SSR improves accuracy).

Method 3: Manual Text Input

Click “Add Text Manually” to type or paste any content:

Meeting notes
Project highlights
Product feature descriptions
Hand‑written summaries
AI‑generated content
Small FAQ entries

Pop will format text automatically, chunk it, and index it.

🛠 How Pop Processes Added Content

Regardless of input method, Pop performs the following pipeline:

1. Document Parsing

PDF → text extraction + OCR
Word / PPT → text extraction
Markdown → structure parsing
URL → article extraction
Code files → plain text conversion

2. Content Cleaning

Remove headers/footers
Remove duplicate lines
Remove noise (section numbers, page breaks)
Merge broken paragraphs

3. Smart Chunking

Pop uses intelligent strategies:

Split by heading structure
Combine natural paragraphs
Maintain optimal chunk size: 200–500 chars
Prevent overly long chunks that degrade embedding quality

4. Vector Embedding Generation

Uses the model you selected for this knowledge base.

5. Index Construction

Includes:

Vector KNN index
BM25 keyword index
Hybrid retrieval weighting

Enables high‑quality RAG responses.

📊 Managing Documents After Import

After uploading, you can:

View structured summary
Preview document content
Re‑index individual documents
Delete documents
View all chunks
Check embedding status of each chunk
Export chunks (for RAG debugging)

If errors occur, Pop will show details in the Task List.

📌 Summary

Pop offers three flexible content‑adding methods:

Method	Best For	Automatic Processing Steps
Upload	Local documents	Parsing → Chunking → Embedding → Indexing
URL Import	Web page articles	Extraction → Chunking → Embedding
Manual Text	Small notes / FAQ	Paragraphing → Embedding → Indexing

No matter where content comes from, Pop converts it into structured, searchable knowledge for high‑quality RAG question‑answering.