Murdock Labs

RAG (Retrieval-Augmented Generation) became the default architecture for AI-powered knowledge systems. But after implementing it for a dozen startups, we've learned that naive RAG is just the starting point.

The Naive RAG Problem

Most RAG implementations follow the same recipe:

Chunk documents
Embed chunks
Store in vector DB
Retrieve top-k on query
Stuff into LLM context
Generate answer

This works for demos. In production, it falls apart because:

Chunking destroys document structure
Semantic similarity != relevance
Top-k retrieval misses context
The LLM hallucinates when retrieval is weak

What Actually Works

After iterating on production RAG systems, here's our current architecture:

1. Hierarchical Indexing

Don't just chunk blindly. Build a hierarchy:

Document level: Summary embeddings for routing
Section level: Thematic chunks with headers
Paragraph level: Fine-grained retrieval
Entity level: Named entities and relationships

Query against multiple levels and merge results.

2. Hybrid Retrieval

Combine multiple retrieval strategies:

Dense retrieval (embeddings) for semantic matching
Sparse retrieval (BM25) for keyword precision
Metadata filtering for structured constraints
Re-ranking (cross-encoder) for precision

No single method wins everywhere. Use all of them.

3. Query Understanding

Before retrieval, decompose the query:

What type of information is needed?
Are there implicit constraints (time, source, format)?
Does this need multi-hop reasoning?

A good query understanding layer improves retrieval quality 30-40%.

4. Answer Verification

After generation, verify:

Is every claim supported by retrieved context?
Are there contradictions in the source material?
Does the answer actually address the question?

Route unverifiable answers to a fallback (human review or "I don't know").

Infrastructure Requirements

Production RAG needs:

Vector DB with filtering: We use Qdrant or Pinecone depending on scale
Chunk management: Track chunk provenance and relationships
Evaluation pipeline: Automated quality metrics on every change
Caching layer: Same questions get asked repeatedly

The Bottom Line

RAG isn't a feature you add in a sprint. It's a system that requires ongoing tuning. If your RAG implementation is "good enough" in testing but falls apart in production, it's because you're treating it as a single component instead of a system.

Invest in the retrieval layer. That's where most of the value is.

RAG Is Not Enough: Building Production Knowledge Systems