The Complete AI Roadmap for Engineers & Tech Leads
- Published on
- -20 mins read
- Authors
- Name
- Vijay Anand Pandian
- @vijayanandrp
Why This Roadmap Exists
AI is moving fast. Knowing how to run a Jupyter notebook isn't enough anymore. If you want to lead AI teams — or just build production-grade AI systems — you need a mental model that connects the dots between theory, engineering, governance, and leadership.
This roadmap is structured in 6 tracks. Work through them in order, or jump to the track you need most.
Track 1 — AI Foundations: The Mental Models That Matter
Before you build anything, you need to understand how these systems actually work.
1.1 How Large Language Models Work
An LLM is a probability machine. Given a sequence of tokens, it predicts the next most likely token — over and over — until it produces a response.
Key concepts to deeply understand:
| Concept | What it means | Why it matters |
|---|---|---|
| Tokenisation | Text is split into sub-word tokens (not words) | Affects cost, context length, and non-English performance |
| Embeddings | Tokens become vectors in high-dimensional space | Foundation for semantic search and RAG |
| Attention | Each token "attends" to other tokens to understand context | Why LLMs understand long sentences |
| Context window | Max tokens the model can process at once | Determines how much input you can give it |
| Temperature | Controls randomness in output | 0 = deterministic, 1+ = creative/risky |
| Top-p / Top-k | Additional sampling controls | Used alongside temperature to shape outputs |
The core loop — autoregressive generation:
Input prompt → Tokenise → Embed → Transformer layers (attention + FFN) → Logits → Sample → Token → Repeat1.2 Transformer Architecture (Enough to Be Dangerous)
You don't need to implement attention from scratch, but you need to explain it in an interview.
Key components:
- Self-attention: every token in the input compares itself to every other token, computing a weighted sum of "how relevant is this other token to me?"
- Multi-head attention: run attention multiple times in parallel (each "head" learns different patterns)
- Feed-forward network: after attention, each token goes through a 2-layer MLP — this is where "knowledge" is stored
- Layer norm + residual connections: stabilise training, allow gradients to flow through deep networks
- Positional encoding: since attention has no inherent order, position is added explicitly
Interview answer for "How does an LLM know things?":
The factual knowledge is baked into the weights of the feed-forward layers during pre-training on internet-scale text. It doesn't "look things up" — it recalls patterns compressed into billions of parameters.
1.3 Types of AI Models You Need to Know
| Model type | Examples | Use case |
|---|---|---|
| Foundation / Base | GPT-4, Claude, Llama 3 | Starting point for everything |
| Instruction-tuned | Claude Sonnet, GPT-4o | Follow user instructions well |
| Embedding models | text-embedding-3, all-MiniLM | Convert text to vectors |
| Multimodal | GPT-4 Vision, Gemini | Text + image understanding |
| Code models | DeepSeek Coder, CodeLlama | Code generation |
| Rerankers | cross-encoders (MS-MARCO) | Re-score RAG retrieval results |
1.4 Prompt Engineering — The Practical Layer
This is where most engineers spend 80% of their initial AI time.
Techniques in order of power:
- Zero-shot: just ask. Works for simple tasks.
- Few-shot: give 2–5 examples before your question. Dramatically improves structured output.
- Chain-of-thought (CoT): ask the model to "think step by step". Improves reasoning.
- Structured output: ask for JSON/XML. Pair with output parsers in LangChain.
- System prompts: set role, tone, constraints. Critical for production systems.
- Prompt chaining: break complex tasks into sub-prompts, pipe outputs together.
The prompt anatomy for production:
System: [Role] [Constraints] [Output format]User: [Context] [Task] [Examples]Assistant: [Seed if needed]Track 2 — Applied AI: RAG, Agents, and Fine-Tuning
2.1 Retrieval-Augmented Generation (RAG)
RAG solves the two biggest LLM problems: hallucination and knowledge cutoff.
Instead of relying on the model's baked-in knowledge, you retrieve relevant documents at query time and inject them into the context.
The RAG pipeline:
Query → [Optional] Query expansion / rewrite → Embed query (embedding model) → Vector search (cosine similarity against document chunks) → [Optional] Reranker (cross-encoder rescore top-k) → Inject top chunks into LLM context → LLM generates answer grounded in retrieved context → [Optional] Post-processing (confidence scoring, PII masking) → Response + source citationsChunking strategies — this is where RAG lives or dies:
| Strategy | When to use |
|---|---|
Fixed-size (chunk_size=512, overlap=64) | Simple baseline, good starting point |
| Sentence-aware splitting | Preserves sentence boundaries, better for factual Q&A |
| Semantic chunking | Splits on topic shifts — best quality, slower |
| Hierarchical (parent + child chunks) | Retrieve child, return parent for context |
| Document-level metadata enrichment | Add title/section/date to each chunk |
Key RAG metrics to know:
| Metric | Measures |
|---|---|
| Faithfulness | Is the answer grounded in the retrieved context? |
| Answer relevancy | Does the answer address the question? |
| Context precision | Are the retrieved chunks actually relevant? |
| Context recall | Did we retrieve all the chunks needed? |
Tools: RAGAS for evaluation, LangSmith for tracing.
Interview answer for "How do you improve RAG quality?":
I start by evaluating each component separately: retrieval quality (precision/recall of chunks), then generation quality (faithfulness, relevancy). Common fixes: better chunking strategy, hybrid search (dense + sparse/BM25), a reranker, and query rewriting. I instrument everything with LangSmith so I can trace failures to the exact step.
2.2 Vector Databases
A vector database stores embeddings and enables fast approximate nearest-neighbour (ANN) search.
Options comparison:
| Tool | Best for | Notes |
|---|---|---|
| ChromaDB | Local dev, POCs | Easiest to get started |
| Pinecone | Managed production | Fully hosted, simple API |
| Weaviate | Hybrid search | Dense + sparse + filters |
| pgvector | Existing PostgreSQL | Zero new infra if you're already on Postgres |
| Azure AI Search | Azure-native | Good integration with Azure OpenAI |
| Databricks Vector Search | Databricks-native | Delta table-backed, auto-sync |
What "cosine similarity" means in plain English:
Two vectors are similar if they point in the same direction, regardless of magnitude. A score of 1.0 = identical meaning. 0.0 = unrelated. Used because it captures semantic similarity between embeddings.
2.3 AI Agents
An agent is an LLM that can take actions — calling tools, running code, browsing the web — in a loop until a goal is achieved.
The agent loop (ReAct pattern):
[Thought] I need to find the current price of X[Action] call_tool("web_search", "current price of X")[Observation] "X costs £Y as of today"[Thought] Now I have the data, I can answer[Final Answer] The price is £YKey agent concepts:
| Concept | What it is |
|---|---|
| Tool / Function calling | LLM decides which function to call and with what args |
| Memory | Short-term (conversation), long-term (vector store), episodic |
| Planning | Breaking goals into sub-tasks (ReAct, Plan-and-Execute) |
| Multi-agent | Orchestrator delegates to specialist sub-agents |
Frameworks to know:
- LangChain Agents — general purpose, most tutorials
- LangGraph — stateful, graph-based agent workflows (production-grade)
- Databricks AI Agents — notebook-native, integrates with Unity Catalog
- AutoGen (Microsoft) — multi-agent conversations
- CrewAI — role-based multi-agent teams
Sparky AI context: Databricks AI Agents with a structured tool set for audience selection logic — the agent calls audience-filter tools, validates outputs, and surfaces results to business users. This is a real-world multi-turn agent with domain-specific tooling.
2.4 Fine-Tuning vs RAG vs Prompt Engineering
This is a classic interview question. Know the decision framework:
Start with prompt engineering → Still not good enough? → Add RAG (for knowledge grounding) → Still not good enough? → Fine-tune (for style/format/domain behaviour) → Need maximum control? → Train from scratch (almost never)| Approach | Cost | Latency | Knowledge updates | Best for |
|---|---|---|---|---|
| Prompt engineering | Low | None | Instant | Style, format, simple tasks |
| RAG | Medium | +50–200ms | Instant (update index) | Knowledge-intensive Q&A |
| Fine-tuning | High | Slightly lower | Requires retraining | Domain tone, consistent format |
| Pre-training | Very high | — | Full control | Proprietary domain (rare) |
Track 3 — AI Engineering: Production-Grade Systems
3.1 MLOps Fundamentals
MLOps = DevOps for ML models. The goal is to take a model from notebook to production reliably.
The MLOps lifecycle:
Data → Feature engineering → Training → Evaluation → Model registry → Deployment → Monitoring → RetrainingTools to know by category:
| Category | Tools |
|---|---|
| Experiment tracking | MLflow, Weights & Biases, Comet |
| Model registry | MLflow Registry, Databricks Model Registry, HuggingFace Hub |
| Feature store | Databricks Feature Store, Feast, Tecton |
| Model serving | MLflow serving, FastAPI, BentoML, Ray Serve, Azure ML |
| Monitoring | Evidently, WhyLogs, Arize, Azure Monitor |
| Orchestration | Airflow, Databricks Workflows, Prefect |
The three things that go wrong in production:
- Data drift — input distribution shifts (e.g., seasonal patterns change)
- Concept drift — the relationship between input and output changes
- Model decay — performance degrades over time as the world changes
Interview answer for "How do you monitor an ML model in production?":
I track three layers: data (input distribution — schema, null rates, feature drift using PSI/KL divergence), model (prediction distribution, confidence scores), and business metrics (downstream KPIs). I use Evidently for statistical drift detection and alert when PSI > 0.2. For LLMs, I also track faithfulness and hallucination rate with a lightweight LLM-as-judge setup.
3.2 LLMOps — MLOps for Language Models
LLMs have unique operational concerns beyond traditional ML.
Key LLMOps concerns:
| Concern | What to track | Tool |
|---|---|---|
| Latency | Time-to-first-token, total response time | LangSmith, Prometheus |
| Cost | Tokens in + tokens out per request | LLM provider dashboards |
| Hallucination rate | % responses not grounded in context | LLM-as-judge, RAGAS |
| Prompt injection | Malicious user inputs hijacking the prompt | Input validation, firewalls |
| PII leakage | Sensitive data in outputs | Presidio, regex gates |
| Throughput | Requests/second under load | Load testing |
Caching strategies for cost reduction:
- Exact cache: return stored response for identical prompts (Redis)
- Semantic cache: cache similar prompts using embedding similarity (GPTCache)
- Prompt compression: reduce input tokens with LLMLingua before sending
3.3 Building with Azure OpenAI (Enterprise Pattern)
Since you're building in M&S's Azure environment, this is critical.
Azure OpenAI vs OpenAI API:
- Data stays within your Azure tenant (compliance)
- Private network endpoints
- Content filtering built-in
- Microsoft responsible AI layer
- Same models, different endpoint format
Enterprise architecture pattern:
User request → API Gateway (rate limiting, auth) → Azure API Management → Azure OpenAI Service → Content filter (input) → Model (GPT-4o / embedding) → Content filter (output) → Application layer (RAG, agent logic) → Audit log → Azure Monitor → Response to userKey services to know:
- Azure OpenAI — LLM hosting
- Azure AI Search — vector + hybrid search (replaces Pinecone in Azure)
- Azure AI Studio — prompt flow, model evaluation
- Azure AI Content Safety — content moderation API
- Databricks on Azure — model training, feature engineering, agent hosting
Track 4 — AI Governance: Responsible AI in Practice
This is what separates a junior AI engineer from an AI Tech Lead. Anyone can get a chatbot working. Governing it is the hard part.
4.1 The Governance Framework
Six pillars of responsible AI (Microsoft / industry standard):
| Pillar | What it means | How to implement |
|---|---|---|
| Fairness | Model doesn't discriminate | Bias audits, fairness metrics (demographic parity, equal opportunity) |
| Reliability & Safety | Model behaves correctly under stress | Red-teaming, guardrails, fallback behaviour |
| Privacy & Security | Data is protected | PII masking, data minimisation, access controls |
| Inclusiveness | Works for all users | Multilingual testing, accessibility review |
| Transparency | Users know they're interacting with AI | AI disclosure, model cards |
| Accountability | Humans are in the loop for high-stakes decisions | Audit logs, human review workflows |
4.2 Model Cards
A model card is a one-page document describing a model's intended use, limitations, evaluation results, and ethical considerations.
Model card sections:
1. Model description (what it is, who built it, when)2. Intended use (primary use cases, out-of-scope uses)3. Training data (source, size, known biases)4. Evaluation results (benchmarks, failure modes)5. Ethical considerations (risks, mitigations)6. Limitations (what it cannot do)7. Recommendations (when to use, when not to)Why this matters for interviews:
A model card forces the team to articulate who the model is for, what it can fail at, and who is accountable. It's the difference between "we shipped a model" and "we shipped a governed AI product."
4.3 EU AI Act Awareness
As of August 2024, the EU AI Act is in force. High-risk AI systems (employment, credit, healthcare) face mandatory requirements.
Key risk tiers:
| Tier | Examples | Requirements |
|---|---|---|
| Unacceptable | Social scoring, biometric surveillance | Banned |
| High-risk | CV screening, credit scoring, medical | Conformity assessment, human oversight, audit logs |
| Limited risk | Chatbots | Transparency (must disclose AI) |
| Minimal risk | Spam filters | No specific requirements |
Interview answer for "How do you approach AI governance?":
I start by classifying the use case risk level — is this a recommendation, a decision-support tool, or an autonomous decision? High-stakes decisions (credit, employment, health) need human oversight. I implement audit logging on every inference, PII masking on inputs and outputs, confidence thresholds with graceful fallbacks, and model cards. I also run quarterly bias audits and red-team exercises.
4.4 Practical Governance Implementation
Code patterns you should be able to demonstrate:
PII Masking (before query hits LLM):
import re
PII_PATTERNS = { "EMAIL": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", "UK_PHONE": r"(\+44|0)[0-9\s\-]{9,13}", "NI_NUMBER": r"[A-Z]{2}\s?\d{2}\s?\d{2}\s?\d{2}\s?[A-D]",}
def mask_pii(text: str) -> tuple[str, bool]: masked = text found = False for label, pattern in PII_PATTERNS.items(): if re.search(pattern, masked, re.IGNORECASE): masked = re.sub(pattern, f"[{label}]", masked, flags=re.IGNORECASE) found = True return masked, foundAudit Logging (every inference):
import json, uuidfrom datetime import datetime
def audit_log(query: str, response: str, confidence: float, pii_detected: bool): record = { "id": str(uuid.uuid4()), "timestamp": datetime.utcnow().isoformat(), "query_masked": mask_pii(query)[0], # never log raw query "response_length": len(response), "confidence": confidence, "pii_detected": pii_detected, "model": "claude-sonnet-4-6", } with open("audit.jsonl", "a") as f: f.write(json.dumps(record) + "\n")Confidence Scoring (simple heuristic):
UNCERTAINTY_SIGNALS = ["i don't know", "i'm not sure", "unclear", "uncertain", "cannot confirm"]
def score_confidence(response: str, chunks: list[str]) -> float: score = 1.0 resp_lower = response.lower()
# Penalise uncertainty language for signal in UNCERTAINTY_SIGNALS: if signal in resp_lower: score -= 0.2
# Reward grounding in source material if chunks: overlap = sum(1 for chunk in chunks if any(word in resp_lower for word in chunk.lower().split()[:10])) score = min(1.0, score + (overlap / len(chunks)) * 0.3)
return max(0.0, score)Track 5 — AI Leadership: Running AI Teams
5.1 The AI Tech Lead Role
An AI Tech Lead is not just a senior engineer who reviews code. They:
- Define technical direction — which models, which stack, which governance standards
- Translate business problems into AI solutions — stakeholder → product → technical spec
- Manage technical risk — hallucination, bias, data privacy, model failure modes
- Coach the team — upskilling engineers on LLMs, prompt engineering, MLOps
- Own quality gates — evaluation frameworks, not just unit tests
- Communicate uncertainty — business stakeholders need confidence intervals, not just "it works"
5.2 The Stakeholder-to-Ship Framework
The pattern that works for translating pain points into AI products:
1. Pain point interview (what problem, who has it, what does failure look like?)2. Feasibility spike (2-day timebox: can AI solve this? What's the baseline?)3. Prototype (1 week: working demo, not polished)4. Stakeholder demo (show failure cases too, not just successes)5. Governance review (risk tier, PII, audit requirements)6. Pilot (small user group, monitored closely)7. Production (with monitoring, fallback, human escalation path)Mission Desk followed this pattern: Pain = manual mission configuration (7 days). Spike = can Streamlit + Databricks replace the spreadsheet? Prototype = 2-week working app. Result = £500K+ annual value, 85% deployment time reduction.
5.3 Running an AI Squad
Ceremonies for AI teams (adapted from Agile):
| Ceremony | Cadence | Purpose |
|---|---|---|
| Model review | Weekly | Review evaluation metrics, flag regressions |
| ADR (Architecture Decision Record) | Per decision | Document why we chose this approach |
| Spike timebox | 2-day max | Time-bounded research on unknowns |
| Red-team session | Monthly | Try to break the model — adversarial inputs |
| Bias audit | Quarterly | Run fairness metrics across demographic slices |
| Stakeholder demo | Bi-weekly | Show progress, collect feedback, reset expectations |
Writing an ADR (Architecture Decision Record):
# ADR-001: Use RAG over fine-tuning for internal FAQ chatbot
## Status: Accepted
## ContextWe need an internal FAQ chatbot that stays up-to-date with policy changes.
## DecisionWe will use RAG with Azure AI Search over fine-tuning.
## Rationale- Policy documents change monthly — fine-tuning requires retraining (expensive)- RAG allows instant knowledge updates by re-indexing documents- Audit trail: we can show which document chunk grounded the answer- Cost: no GPU training budget needed
## Trade-offs- RAG adds 100–200ms latency vs direct LLM call- Quality depends on document chunking quality
## ConsequencesMust maintain document indexing pipeline and monitor retrieval quality5.4 Communicating AI to Non-Technical Stakeholders
This is the skill that differentiates an AI Tech Lead from an AI engineer.
Translate technical concepts to business language:
| Technical | Business language |
|---|---|
| Hallucination | "The model sometimes makes things up — we have confidence scoring and human review for high-stakes outputs" |
| Context window | "The model can only read X pages of information at once — our RAG system handles larger document sets" |
| Model drift | "The model's performance can degrade over time as data patterns change — we monitor this monthly" |
| Fine-tuning | "We teach the model your company's specific language and formats — takes 2–4 weeks" |
| Embeddings | "We convert text into numbers that capture meaning — allows us to find related content without keyword matching" |
The three things a stakeholder always wants to know:
- Will it be accurate? → Show confidence scores, failure modes, and human escalation path
- Will my data be safe? → Explain PII masking, data boundaries, and audit logs
- What happens when it's wrong? → Show the fallback, the alert, and the human review process
Track 6 — Interview Playbook for AI Tech Lead Roles
6.1 Technical Questions
Q: Explain RAG to me like I'm a senior engineer.
RAG is a pattern where instead of relying on an LLM's baked-in knowledge, you retrieve relevant documents at query time from a vector database, inject them into the context window, and ask the LLM to answer using only those documents. This grounds the answer in your data, eliminates knowledge cutoff issues, and makes hallucinations auditable — you can show which chunk the answer came from.
Q: How would you design an enterprise chatbot for internal documents?
I'd use a RAG architecture: ingest documents into Azure AI Search with semantic chunking (512 tokens, 64 overlap), use Azure OpenAI embeddings for vectorisation, add a reranker for top-k results, then pipe into GPT-4o with a strict system prompt. Governance layer: PII mask all inputs, JSONL audit log every inference, confidence threshold with "I don't know" fallback below 0.6. Monitoring: track faithfulness weekly with RAGAS, alert on hallucination rate above 5%.
Q: What's the difference between an LLM agent and a simple LLM call?
A single LLM call is stateless — you send a prompt, get a response. An agent has a loop: it can call tools (APIs, databases, code executors), observe the results, reason about what to do next, and iterate until a goal is reached. Agents are better for multi-step tasks but introduce new risks (tool misuse, runaway loops) — you need timeouts, max iterations, and human approval for irreversible actions.
Q: How do you handle hallucinations in production?
Three layers: prevention (RAG grounds the answer in real sources, strict system prompt with "only use the provided context"), detection (confidence scoring, faithfulness check with LLM-as-judge), and mitigation (low-confidence responses trigger a fallback: "I'm not confident enough to answer — here are the relevant documents"). Critical: audit log every response so you can investigate failures.
Q: How do you evaluate an LLM in production?
I separate evaluation into three levels: component (retrieval quality — precision/recall of chunks), end-to-end (RAGAS metrics: faithfulness, answer relevancy, context precision), and business (downstream KPI — e.g., did the agent's recommendation lead to the right action?). I also run regression tests on a golden dataset of 50–100 question/answer pairs before every deployment. For LLM-as-judge, I use a separate, stronger model to score responses.
Q: When would you fine-tune vs use RAG?
RAG for knowledge (factual information that changes or is proprietary). Fine-tuning for behaviour (consistent tone, specific output formats, domain terminology the base model doesn't know well). In practice: start with prompt engineering, add RAG when you need fresh knowledge, fine-tune only if you've proven RAG isn't enough — it's 10x more expensive and complex to maintain.
6.2 Architecture & Design Questions
Q: Design an AI system for retail product recommendations.
I'd build a hybrid system: collaborative filtering (user behaviour) + LLM-enhanced descriptions. The LLM layer enriches product embeddings with semantic understanding — "this jumper is warm, casual, suitable for winter weekends." At inference: retrieve top-20 by collaborative score, rerank using semantic similarity to user's stated intent, apply business rules (stock availability, margin targets), return top-5 with explanation. Governance: log all recommendations, A/B test against baseline, monitor click-through and conversion weekly.
Q: How would you make an AI system compliant with the EU AI Act?
First, classify the risk tier — employment screening and credit scoring are high-risk. For high-risk: document the system with a model card, implement human oversight (no fully autonomous decisions), maintain audit logs for 10 years, conduct conformity assessment, register in the EU AI database. For the technical side: bias testing across protected characteristics, explainability layer (SHAP values or attention visualisation), incident reporting process.
6.3 Leadership & Process Questions
Q: How do you build trust with business stakeholders for AI projects?
I involve them before writing a line of code — the pain point interview is non-negotiable. Then I show failure cases in demos, not just successes. Stakeholders trust you more when you say "here's what it gets wrong and here's how we handle it" than when you only show the good outputs. I set accuracy expectations with ranges, not single numbers. And I deliver something working in 2 weeks rather than promising something perfect in 6 months.
Q: How do you keep an AI team moving fast while maintaining quality?
Two-day spike timeboxes for unknowns — no open-ended research. Golden dataset regression tests before every deploy — if you break something, you know immediately. ADRs for major decisions — stops re-litigating the same choices. Weekly model review — catch drift before stakeholders do. Bi-weekly demos — keeps the team accountable and stakeholders aligned.
Q: How do you communicate AI risk to a non-technical director?
I use a traffic light risk tier system and avoid technical jargon. "This model is a decision-support tool — it surfaces the best options, but a human makes the final call. We've tested it on 500 examples and it's right 94% of the time. When it's wrong, it fails in these specific ways, and here's the escalation path." I always answer: what's the worst case if it goes wrong, and what have we done to prevent it?
Q: Tell me about a time you delivered AI value to the business.
Mission Desk at M&S: business users were manually configuring loyalty missions in spreadsheets — 7 days per deployment, 12% error rate. I built a full-stack platform (Streamlit, PostgreSQL, Databricks) solo using AI-assisted development. Result: deployment time from 7 days to 1 day (85% reduction), error rate from 12% to under 2%, managing 1M+ loyalty records, estimated £500K+ annual business value. Nominated at the Sparks All Hands for company-wide recognition.
Your Learning Sequence (30-Day Sprint)
Week 1 — Foundations Day 1–2: LLMs, tokenisation, embeddings, attention (Track 1) Day 3–4: Prompt engineering — CoT, few-shot, structured output Day 5–7: Build a simple chatbot with Claude API or OpenAI
Week 2 — Applied AI Day 8–10: RAG — build end-to-end with LangChain + ChromaDB Day 11–12: Vector databases — compare ChromaDB, pgvector, Azure AI Search Day 13–14: Agents — ReAct pattern, tool calling, LangGraph basics
Week 3 — Engineering & Governance Day 15–16: MLOps — MLflow experiment tracking + model registry Day 17–18: LLMOps — LangSmith tracing, RAGAS evaluation Day 19–20: Governance — PII masking, audit logging, confidence scoring, model cards Day 21: EU AI Act — read the official summary, classify your current projects
Week 4 — Leadership & Interview Prep Day 22–23: Write one ADR for a real decision you've made Day 24–25: Mock answer all 10 interview questions above out loud Day 26–27: Build one thing: add a reranker to your RAG POC or an agent tool Day 28–30: Blog post or LinkedIn post sharing what you learnedResources
Papers (read the abstracts + introduction, not the full paper):
- Attention Is All You Need (2017) — the original Transformer paper
- RAG: Retrieval-Augmented Generation (2020) — Facebook AI
- ReAct: Synergizing Reasoning and Acting in Language Models (2022)
Courses:
- DeepLearning.AI Short Courses (free) — LangChain, RAG, agents, fine-tuning
- Fast.ai Practical Deep Learning — if you want to go deeper on fundamentals
- Databricks Generative AI Engineer Associate exam prep
Tools to have set up:
- LangChain + LangSmith (free tier)
- Anthropic API (Claude) or OpenAI API
- ChromaDB or pgvector locally
- MLflow locally (
pip install mlflow)
Written by Vijay Anand Pandian — AI Tech Lead & Senior Data Engineer at M&S Sparks. Building governed AI systems that bridge business and engineering.