RAG Is Not Enough: Building Production Knowledge Systems
HN Reference: HN discussion on 'RAG is just a hack' and the future of knowledge-grounded AI (Sep 2024)
RAG (Retrieval-Augmented Generation) became the default architecture for AI-powered knowledge systems. But after implementing it for a dozen startups, we've learned that naive RAG is just the starting point.
The Naive RAG Problem
Most RAG implementations follow the same recipe:
- Chunk documents
- Embed chunks
- Store in vector DB
- Retrieve top-k on query
- Stuff into LLM context
- Generate answer
This works for demos. In production, it falls apart because:
- Chunking destroys document structure
- Semantic similarity != relevance
- Top-k retrieval misses context
- The LLM hallucinates when retrieval is weak
What Actually Works
After iterating on production RAG systems, here's our current architecture:
1. Hierarchical Indexing
Don't just chunk blindly. Build a hierarchy:
- Document level: Summary embeddings for routing
- Section level: Thematic chunks with headers
- Paragraph level: Fine-grained retrieval
- Entity level: Named entities and relationships
Query against multiple levels and merge results.
2. Hybrid Retrieval
Combine multiple retrieval strategies:
- Dense retrieval (embeddings) for semantic matching
- Sparse retrieval (BM25) for keyword precision
- Metadata filtering for structured constraints
- Re-ranking (cross-encoder) for precision
No single method wins everywhere. Use all of them.
3. Query Understanding
Before retrieval, decompose the query:
- What type of information is needed?
- Are there implicit constraints (time, source, format)?
- Does this need multi-hop reasoning?
A good query understanding layer improves retrieval quality 30-40%.
4. Answer Verification
After generation, verify:
- Is every claim supported by retrieved context?
- Are there contradictions in the source material?
- Does the answer actually address the question?
Route unverifiable answers to a fallback (human review or "I don't know").
Infrastructure Requirements
Production RAG needs:
- Vector DB with filtering: We use Qdrant or Pinecone depending on scale
- Chunk management: Track chunk provenance and relationships
- Evaluation pipeline: Automated quality metrics on every change
- Caching layer: Same questions get asked repeatedly
The Bottom Line
RAG isn't a feature you add in a sprint. It's a system that requires ongoing tuning. If your RAG implementation is "good enough" in testing but falls apart in production, it's because you're treating it as a single component instead of a system.
Invest in the retrieval layer. That's where most of the value is.