Overview
In this episode I explain exactly how Retrieval Augmented Generation (RAG) works under the hood — the two phases, five steps each, and the one system prompt that stops AI hallucinations. No demo, just clean architecture explanation.
- Why RAG matters (and the 3 problems it solves)
- Phase 1: Indexing — chunking, embedding, storage
- Phase 2: Query — vector search, prompt building, generation
- The system prompt trick that prevents hallucinations
- RAG vs fine-tuning — which to pick and when
Built on Microsoft Azure with Azure AI Foundry, Azure AI Search, and GPT-4o.
📺 Episode 1 (Build a RAG chatbot on Azure): https://www.youtube.com/watch?v=CFSgt0UXVcY
Video Timeline
- 0:00 What is RAG and why it matters
- 0:25 Phase 1: Indexing (chunking, embedding, storage)
- 1:35 Phase 2: Query (vector search, system prompt)
- 3:00 RAG vs Fine-tuning
- 3:30 Next episode preview
Key Takeaways
- Practical cloud architecture patterns you can apply immediately
- Real-world implementation guidance from enterprise experience
- Azure, AWS, and multi-cloud considerations
- Security-first and cost-optimised design principles
Watch & Learn
Watch the full video above for a detailed walkthrough. Subscribe to Tech with RKM on YouTube for regular cloud and AI architecture content.


