> ## Documentation Index
> Fetch the complete documentation index at: https://aitutorial.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Recap & Resources

> Recap and resources for the RAG module

## Key Takeaways

1. **Two-stage retrieval is production standard:** Fast first-pass (BM25/vectors) + precise reranking; often yields higher accuracy than single-stage.

2. **Hybrid search wins:** Combining BM25 + semantic catches both exact and conceptual matches and is widely used in production.

3. **Chunking strategy matters:** Structure-aware chunking with metadata dramatically improves retrieval quality. One-size-fits-all fails.

4. **Evaluate retrieval separately from generation:** Many RAG failures stem from retrieval issues. Debug component-by-component.

5. **Golden datasets are non-negotiable:** You cannot improve what you don't measure. Build 50-100 test cases minimum.

**Production Checklist:**

Before deploying RAG to production, ensure:

* [ ] Two-stage retrieval implemented (first-pass + reranking)
* [ ] Appropriate chunking strategy for your document types
* [ ] Metadata extraction for filtering and context
* [ ] Hybrid search (BM25 + semantic) or justified single-strategy
* [ ] Golden evaluation dataset (50+ test cases)
* [ ] Automated evaluation pipeline (CI/CD integration)
* [ ] Component-level monitoring (retrieval and generation metrics)
* [ ] Error handling for empty results and low-confidence answers
* [ ] Observability (logging queries, retrieved docs, answers)
* [ ] Cost optimization (caching, efficient embedding models)

## Common Pitfalls Recap

❌ **Single-stage retrieval:** Sacrifices either speed or accuracy\
❌ **Ignoring metadata:** Misses substantial potential accuracy improvements\
❌ **No evaluation:** "Vibe checks" fail in production\
❌ **Uniform chunking:** One strategy doesn't fit all document types\
❌ **Skipping OCR preprocessing:** Poor quality in, poor results out\
❌ **No golden dataset:** Cannot measure improvement or regression

## Trends (Last 3-6 months)

**What's new:**

* Unified RAG evaluation with side-by-side retrieval, reranking, and end-to-end answer grading using human and LLM feedback, with rerank list inspection UIs. (RankArena, 2025)
* Modular retrieval+rerank toolkits simplify A/B testing across BM25, dense, hybrid, and cross-encoders via plug-and-play configs and ablations. (Rankify, 2025)
* “Hybrid by default” guidance: combine sparse+dense first-pass with cross-encoder rerank; emphasize score fusion and pipeline parallelism. (2025)
* Practitioner playbooks stress “fast recall → precise rerank” to reduce top-k noise without large cost/latency increases. (2025)
* Serving optimization work proposes practical knobs (candidate pool sizes, batching, parallel fusion) for latency/cost control. (RAGO, 2025)

**Vendor-specific (awareness):**

* Milvus: open-source vector DB focused on scale; GPU-accelerated indexing and billion-scale search for low-latency RAG.
* Qdrant: Rust-based vector engine with strong filtering; emphasizes high performance and hybrid/rerank integrations.
* pgvector: PostgreSQL extension enabling vector search and hybrid (SQL + vectors) inside existing relational stacks.
* Pinecone: managed vector DB; guidance centers on hybrid search, namespaces, and rerank integration for production RAG.
* Weaviate: vector DB with built-in hybrid (sparse+dense) and modules for rerankers.
* Elasticsearch/OpenSearch: BM25 + kNN hybrid patterns via dense vectors alongside mature keyword filters.
* LlamaIndex: [LlamaCloud + LlamaParse](https://www.llamaindex.ai/blog/introducing-llamacloud-and-llamaparse-af8cedf9006b) for managed parsing/ingestion and advanced PDF/table parsing integrated with LlamaIndex pipelines (public preview).
* Google Gemini: [Gemini API File Search](https://ai.google.dev/gemini-api/docs/file-search) enables file-backed retrieval with managed indexing, chunking, and citation-friendly results for Gemini models.

**References:**

* RankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG with Human and LLM Feedback (2025). [https://arxiv.org/abs/2508.05512](https://arxiv.org/abs/2508.05512)
* Rankify: Toolkit for Retrieval, Re-Ranking and RAG (2025). [https://www.uibk.ac.at/en/disc/blog/rankify-framework/](https://www.uibk.ac.at/en/disc/blog/rankify-framework/)
* Retrieval-Augmented Generation - Learn page (2025). [https://www.pinecone.io/learn/retrieval-augmented-generation/](https://www.pinecone.io/learn/retrieval-augmented-generation/)
* The Power of Reranking in RAG Systems (2025). [https://dj3dw.com/blog/the-power-of-reranking-in-retrieval-augmented-generation-rag-systems/](https://dj3dw.com/blog/the-power-of-reranking-in-retrieval-augmented-generation-rag-systems/)
* RAGO: Systematic Performance Optimization for RAG Serving (2025). [https://arxiv.org/abs/2503.14649](https://arxiv.org/abs/2503.14649)
* LlamaIndex: Introducing LlamaCloud and LlamaParse (2024). [https://www.llamaindex.ai/blog/introducing-llamacloud-and-llamaparse-af8cedf9006b](https://www.llamaindex.ai/blog/introducing-llamacloud-and-llamaparse-af8cedf9006b)
* Google: Gemini API File Search (docs). [https://ai.google.dev/gemini-api/docs/file-search](https://ai.google.dev/gemini-api/docs/file-search)
