Developer

Build a RAG Knowledge Base

Plan document ingestion, chunking, retrieval, answering, and evaluation so the project moves beyond a demo.

Best for

AI application developers, enterprise knowledge owners, and technical product managers.

Final output

A RAG architecture checklist, data workflow, evaluation questions, and pre-launch checks.

Workflow snapshot

Last checked: 2026-05-13

Complexity: 5-stage workflow
Sources: 2 Review Notes: Treat AI output as a draft and verify facts, rights, platform rules, and business claims before publishing.
Inputs

Document sources, permission rules, update frequency, and sample user questions. Expected answers, refusal boundaries, and citation requirements.

Tool chain

LlamaIndex -> Cloudflare Workers AI / Vectorize -> ChatGPT

Final output

A RAG architecture checklist, data workflow, evaluation questions, and pre-launch checks.

Human review

Answers cite sources and refuse when source material is missing. Permission boundaries prevent private documents from leaking to the wrong user.

  1. 01 Define knowledge boundaries
  2. 02 Clean and chunk documents
  3. 03 Build the retrieval path
  4. 04 Create an evaluation set
  5. 05 Monitor after launch

Input Materials

  • Document sources, permission rules, update frequency, and sample user questions.
  • Expected answers, refusal boundaries, and citation requirements.
  • An evaluation set with at least 20 real questions and approved reference answers.

Review Checklist

  • Answers cite sources and refuse when source material is missing.
  • Permission boundaries prevent private documents from leaking to the wrong user.
  • Retrieval misses, outdated documents, and conflicting answers have handling rules.

Common Failure Modes

  • Only generation quality is checked while retrieval quality is ignored.
  • All documents are mixed into one index, breaking permission boundaries.
  • Answers lack citations and cannot be traced during review.

Output Template

Evaluation table: question / expected source / reference answer / retrieved context / generated answer / pass result / fix notes.

Recommended Tool Stack

Tools are organized by workflow role. Unlisted tools can be added to the library later.

1

LlamaIndex

RAG orchestration

Organize document loading, indexing, query engines, and evaluation flow.

2

Cloudflare Workers AI / Vectorize

Deployment and retrieval

Host model calls and vector retrieval inside the Cloudflare architecture.

3

ChatGPT

Evaluation samples

Draft sample questions and answer review checklists.

Complete Workflow

Use AI outputs as drafts; facts, copyright, platform rules, and business claims need human review.

  1. Stage 01

    Define knowledge boundaries

    Decide which documents enter the knowledge base, what stays out, and who owns updates.

  2. Stage 02

    Clean and chunk documents

    Split by headings, paragraphs, tables, and FAQs while preserving source URL and update time.

  3. Stage 03

    Build the retrieval path

    Implement indexing, querying, retrieval, and answer composition with citations.

    Reusable prompt
    Design a RAG chunking strategy, metadata fields, and retrieval test questions for this document structure: {document notes}
  4. Stage 04

    Create an evaluation set

    Prepare frequent questions, boundary questions, no-answer cases, and known bad-answer examples.

  5. Stage 05

    Monitor after launch

    Track no-answer cases, hallucinations, low-relevance retrieval, and user feedback for updates.

FAQ

Can this workflow publish automatically?

Not recommended. AI is useful for drafts, variants, and checklists, but facts, asset rights, and platform rules need human confirmation.

What if my tool stack is different?

Keep the workflow roles: ideation, generation, editing, review, and learning. Substitute specific tools with existing team accounts.

Sources

Last checked: 2026-05-13

  • Introduction to RAG LlamaIndex Docs · Source used to verify the referenced tool capability and workflow boundary.
  • Build Agents on Cloudflare Cloudflare Docs · Source used to verify the referenced tool capability and workflow boundary.

Review Notes

  • Treat AI output as a draft and verify facts, rights, platform rules, and business claims before publishing.
  • Tool pricing, quotas, and capabilities may change; check official sources before purchase or automation.