Workshop: Build a RAG Pipeline
Build a complete Retrieval-Augmented Generation pipeline from scratch โ from raw documents to a production-quality question-answering system with citations and evaluation.
| Duration | 2 hours (6 sections ร 20 min) |
| Level | Intermediate |
| Solution Play | 01 โ Enterprise RAG |
| You'll Build | Document ingestion โ chunking โ embedding โ indexing โ retrieval โ generation โ evaluation |
Prerequisitesโ
- Azure subscription with Azure OpenAI access
- Azure OpenAI โ GPT-4o deployment + text-embedding-3-large deployment
- Azure AI Search โ Basic tier or higher (Free tier lacks semantic ranker)
- Azure Blob Storage โ for source documents
- VS Code with FrootAI extension installed
- Python 3.10+ with
pip
pip install openai azure-search-documents azure-identity azure-storage-blob
Section 1: Concepts (20 min)โ
RAG solves the core LLM limitation: models don't know your data. Retrieve relevant context at query time and inject it into the prompt โ no retraining needed.
:::info Why RAG Over Fine-Tuning? RAG gives you up-to-date knowledge without retraining. Update documents, re-index, and the system reflects changes immediately. See T1: Fine-Tuning for when fine-tuning is appropriate. :::
Key components: chunking โ embedding โ indexing โ retrieval (hybrid search) โ generation with citations.
Section 2: Data Preparation (20 min)โ
Upload Source Documentsโ
Upload PDFs or text files to Azure Blob Storage:
az storage blob upload-batch \
--account-name <storage-account> \
--destination documents \
--source ./data/pdfs
Chunking Strategyโ
Split documents into 512-token chunks with 128-token overlap, then generate embeddings:
def chunk_document(text: str, chunk_size: int = 512, overlap: int = 128):
words = text.split()
return [" ".join(words[i:i + chunk_size])
for i in range(0, len(words), chunk_size - overlap)
if words[i:i + chunk_size]]
client = AzureOpenAI(azure_endpoint="https://<resource>.openai.azure.com/",
api_version="2024-06-01")
def embed(text: str) -> list[float]:
return client.embeddings.create(
model="text-embedding-3-large", input=text
).data[0].embedding # 3072 dimensions
Section 3: Index Build (20 min)โ
Create an Azure AI Search index with keyword, vector, and semantic fields:
from azure.search.documents.indexes.models import (
SearchIndex, SearchField, VectorSearch,
HnswAlgorithmConfiguration, VectorSearchProfile
)
index = SearchIndex(name="rag-workshop", fields=[
SearchField(name="id", type="Edm.String", key=True),
SearchField(name="content", type="Edm.String", searchable=True),
SearchField(name="source", type="Edm.String", filterable=True),
SearchField(name="embedding", type="Collection(Edm.Single)",
vector_search_dimensions=3072,
vector_search_profile_name="hnsw-profile")
], vector_search=VectorSearch(
algorithms=[HnswAlgorithmConfiguration(name="hnsw")],
profiles=[VectorSearchProfile(name="hnsw-profile",
algorithm_configuration_name="hnsw")]
))
Push chunked and embedded documents:
from azure.search.documents import SearchClient
search_client = SearchClient(endpoint, "rag-workshop", credential)
search_client.upload_documents([
{"id": str(i), "content": chunk, "source": "doc.pdf",
"embedding": embed(chunk)} for i, chunk in enumerate(chunks)
])
Section 4: Query Pipeline (20 min)โ
Hybrid Search (Keyword + Vector + Semantic Ranker)โ
from azure.search.documents.models import VectorizableTextQuery
results = search_client.search(
search_text=query,
vector_queries=[VectorizableTextQuery(
text=query, k_nearest_neighbors=5, fields="embedding")],
query_type="semantic", semantic_configuration_name="default", top=5)
def build_context(results) -> str:
return "\n\n---\n\n".join(
f"[Source {i}: {r['source']}]\n{r['content']}"
for i, r in enumerate(results, 1))
Section 5: Full RAG Pipeline (20 min)โ
Wire retrieval to generation with citation support:
async def rag_query(question: str) -> dict:
results = search_client.search(search_text=question, top=5,
vector_queries=[VectorizableTextQuery(text=question,
k_nearest_neighbors=5, fields="embedding")],
query_type="semantic", semantic_configuration_name="default")
context = build_context(results)
response = client.chat.completions.create(model="gpt-4o", messages=[
{"role": "system", "content":
"Answer based ONLY on the provided context. Cite as [Source N]. "
"If the context doesn't contain the answer, say so."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
], temperature=0.1, max_tokens=1000)
return {"answer": response.choices[0].message.content,
"sources": [r["source"] for r in results]}
:::tip Streaming for Better UX
In production, use stream=True to return tokens as they're generated. Users see the first token in ~500ms instead of waiting 3-5s for the full response.
:::
Section 6: Evaluation (20 min)โ
Evaluate your RAG pipeline with standardized metrics:
| Metric | Target | What It Measures |
|---|---|---|
| Groundedness | โฅ 4.0 | Is the answer supported by retrieved context? |
| Relevance | โฅ 4.0 | Does the answer address the user's question? |
| Coherence | โฅ 4.0 | Is the answer logically structured? |
| Citation accuracy | โฅ 90% | Do citations match actual source content? |
from azure.ai.evaluation import GroundednessEvaluator
evaluator = GroundednessEvaluator(model_config)
score = evaluator(
response=result["answer"],
context=context,
query=question
)
print(f"Groundedness: {score['groundedness']}") # Target: โฅ 4.0
See T2: Responsible AI for the full evaluation framework.
Cleanupโ
Remove Azure resources to avoid ongoing charges:
az group delete --name rag-workshop-rg --yes --no-wait
Next Stepsโ
- Explore Solution Play 01 for the production-ready version
- Learn about R2: RAG Architecture for advanced patterns
- Try the Multi-Agent Workshop for agentic RAG