← Back to Blog
AI··9 min read

RAG Is Not Enough: Building Production Knowledge Systems

HN Reference: HN discussion on 'RAG is just a hack' and the future of knowledge-grounded AI (Sep 2024)

RAG (Retrieval-Augmented Generation) became the default architecture for AI-powered knowledge systems. But after implementing it for a dozen startups, we've learned that naive RAG is just the starting point.

The Naive RAG Problem

Most RAG implementations follow the same recipe:

  1. Chunk documents
  2. Embed chunks
  3. Store in vector DB
  4. Retrieve top-k on query
  5. Stuff into LLM context
  6. Generate answer

This works for demos. In production, it falls apart because:

  • Chunking destroys document structure
  • Semantic similarity != relevance
  • Top-k retrieval misses context
  • The LLM hallucinates when retrieval is weak

What Actually Works

After iterating on production RAG systems, here's our current architecture:

1. Hierarchical Indexing

Don't just chunk blindly. Build a hierarchy:

  • Document level: Summary embeddings for routing
  • Section level: Thematic chunks with headers
  • Paragraph level: Fine-grained retrieval
  • Entity level: Named entities and relationships

Query against multiple levels and merge results.

2. Hybrid Retrieval

Combine multiple retrieval strategies:

  • Dense retrieval (embeddings) for semantic matching
  • Sparse retrieval (BM25) for keyword precision
  • Metadata filtering for structured constraints
  • Re-ranking (cross-encoder) for precision

No single method wins everywhere. Use all of them.

3. Query Understanding

Before retrieval, decompose the query:

  • What type of information is needed?
  • Are there implicit constraints (time, source, format)?
  • Does this need multi-hop reasoning?

A good query understanding layer improves retrieval quality 30-40%.

4. Answer Verification

After generation, verify:

  • Is every claim supported by retrieved context?
  • Are there contradictions in the source material?
  • Does the answer actually address the question?

Route unverifiable answers to a fallback (human review or "I don't know").

Infrastructure Requirements

Production RAG needs:

  • Vector DB with filtering: We use Qdrant or Pinecone depending on scale
  • Chunk management: Track chunk provenance and relationships
  • Evaluation pipeline: Automated quality metrics on every change
  • Caching layer: Same questions get asked repeatedly

The Bottom Line

RAG isn't a feature you add in a sprint. It's a system that requires ongoing tuning. If your RAG implementation is "good enough" in testing but falls apart in production, it's because you're treating it as a single component instead of a system.

Invest in the retrieval layer. That's where most of the value is.

RAGAI ArchitectureVector DatabasesEnterprise AI