The Complete AI Roadmap for Engineers & Tech Leads

Published on
-
20 mins read
Authors

Why This Roadmap Exists

AI is moving fast. Knowing how to run a Jupyter notebook isn't enough anymore. If you want to lead AI teams — or just build production-grade AI systems — you need a mental model that connects the dots between theory, engineering, governance, and leadership.

This roadmap is structured in 6 tracks. Work through them in order, or jump to the track you need most.


Track 1 — AI Foundations: The Mental Models That Matter

Before you build anything, you need to understand how these systems actually work.

1.1 How Large Language Models Work

An LLM is a probability machine. Given a sequence of tokens, it predicts the next most likely token — over and over — until it produces a response.

Key concepts to deeply understand:

ConceptWhat it meansWhy it matters
TokenisationText is split into sub-word tokens (not words)Affects cost, context length, and non-English performance
EmbeddingsTokens become vectors in high-dimensional spaceFoundation for semantic search and RAG
AttentionEach token "attends" to other tokens to understand contextWhy LLMs understand long sentences
Context windowMax tokens the model can process at onceDetermines how much input you can give it
TemperatureControls randomness in output0 = deterministic, 1+ = creative/risky
Top-p / Top-kAdditional sampling controlsUsed alongside temperature to shape outputs

The core loop — autoregressive generation:

Input prompt → Tokenise → Embed → Transformer layers (attention + FFN) → Logits → Sample → Token → Repeat

1.2 Transformer Architecture (Enough to Be Dangerous)

You don't need to implement attention from scratch, but you need to explain it in an interview.

Key components:

  • Self-attention: every token in the input compares itself to every other token, computing a weighted sum of "how relevant is this other token to me?"
  • Multi-head attention: run attention multiple times in parallel (each "head" learns different patterns)
  • Feed-forward network: after attention, each token goes through a 2-layer MLP — this is where "knowledge" is stored
  • Layer norm + residual connections: stabilise training, allow gradients to flow through deep networks
  • Positional encoding: since attention has no inherent order, position is added explicitly

Interview answer for "How does an LLM know things?":

The factual knowledge is baked into the weights of the feed-forward layers during pre-training on internet-scale text. It doesn't "look things up" — it recalls patterns compressed into billions of parameters.

1.3 Types of AI Models You Need to Know

Model typeExamplesUse case
Foundation / BaseGPT-4, Claude, Llama 3Starting point for everything
Instruction-tunedClaude Sonnet, GPT-4oFollow user instructions well
Embedding modelstext-embedding-3, all-MiniLMConvert text to vectors
MultimodalGPT-4 Vision, GeminiText + image understanding
Code modelsDeepSeek Coder, CodeLlamaCode generation
Rerankerscross-encoders (MS-MARCO)Re-score RAG retrieval results

1.4 Prompt Engineering — The Practical Layer

This is where most engineers spend 80% of their initial AI time.

Techniques in order of power:

  1. Zero-shot: just ask. Works for simple tasks.
  2. Few-shot: give 2–5 examples before your question. Dramatically improves structured output.
  3. Chain-of-thought (CoT): ask the model to "think step by step". Improves reasoning.
  4. Structured output: ask for JSON/XML. Pair with output parsers in LangChain.
  5. System prompts: set role, tone, constraints. Critical for production systems.
  6. Prompt chaining: break complex tasks into sub-prompts, pipe outputs together.

The prompt anatomy for production:

System: [Role] [Constraints] [Output format]
User: [Context] [Task] [Examples]
Assistant: [Seed if needed]

Track 2 — Applied AI: RAG, Agents, and Fine-Tuning

2.1 Retrieval-Augmented Generation (RAG)

RAG solves the two biggest LLM problems: hallucination and knowledge cutoff.

Instead of relying on the model's baked-in knowledge, you retrieve relevant documents at query time and inject them into the context.

The RAG pipeline:

Query
→ [Optional] Query expansion / rewrite
→ Embed query (embedding model)
→ Vector search (cosine similarity against document chunks)
→ [Optional] Reranker (cross-encoder rescore top-k)
→ Inject top chunks into LLM context
→ LLM generates answer grounded in retrieved context
→ [Optional] Post-processing (confidence scoring, PII masking)
→ Response + source citations

Chunking strategies — this is where RAG lives or dies:

StrategyWhen to use
Fixed-size (chunk_size=512, overlap=64)Simple baseline, good starting point
Sentence-aware splittingPreserves sentence boundaries, better for factual Q&A
Semantic chunkingSplits on topic shifts — best quality, slower
Hierarchical (parent + child chunks)Retrieve child, return parent for context
Document-level metadata enrichmentAdd title/section/date to each chunk

Key RAG metrics to know:

MetricMeasures
FaithfulnessIs the answer grounded in the retrieved context?
Answer relevancyDoes the answer address the question?
Context precisionAre the retrieved chunks actually relevant?
Context recallDid we retrieve all the chunks needed?

Tools: RAGAS for evaluation, LangSmith for tracing.

Interview answer for "How do you improve RAG quality?":

I start by evaluating each component separately: retrieval quality (precision/recall of chunks), then generation quality (faithfulness, relevancy). Common fixes: better chunking strategy, hybrid search (dense + sparse/BM25), a reranker, and query rewriting. I instrument everything with LangSmith so I can trace failures to the exact step.

2.2 Vector Databases

A vector database stores embeddings and enables fast approximate nearest-neighbour (ANN) search.

Options comparison:

ToolBest forNotes
ChromaDBLocal dev, POCsEasiest to get started
PineconeManaged productionFully hosted, simple API
WeaviateHybrid searchDense + sparse + filters
pgvectorExisting PostgreSQLZero new infra if you're already on Postgres
Azure AI SearchAzure-nativeGood integration with Azure OpenAI
Databricks Vector SearchDatabricks-nativeDelta table-backed, auto-sync

What "cosine similarity" means in plain English:

Two vectors are similar if they point in the same direction, regardless of magnitude. A score of 1.0 = identical meaning. 0.0 = unrelated. Used because it captures semantic similarity between embeddings.

2.3 AI Agents

An agent is an LLM that can take actions — calling tools, running code, browsing the web — in a loop until a goal is achieved.

The agent loop (ReAct pattern):

[Thought] I need to find the current price of X
[Action] call_tool("web_search", "current price of X")
[Observation] "X costs £Y as of today"
[Thought] Now I have the data, I can answer
[Final Answer] The price is £Y

Key agent concepts:

ConceptWhat it is
Tool / Function callingLLM decides which function to call and with what args
MemoryShort-term (conversation), long-term (vector store), episodic
PlanningBreaking goals into sub-tasks (ReAct, Plan-and-Execute)
Multi-agentOrchestrator delegates to specialist sub-agents

Frameworks to know:

  • LangChain Agents — general purpose, most tutorials
  • LangGraph — stateful, graph-based agent workflows (production-grade)
  • Databricks AI Agents — notebook-native, integrates with Unity Catalog
  • AutoGen (Microsoft) — multi-agent conversations
  • CrewAI — role-based multi-agent teams

Sparky AI context: Databricks AI Agents with a structured tool set for audience selection logic — the agent calls audience-filter tools, validates outputs, and surfaces results to business users. This is a real-world multi-turn agent with domain-specific tooling.

2.4 Fine-Tuning vs RAG vs Prompt Engineering

This is a classic interview question. Know the decision framework:

Start with prompt engineering
→ Still not good enough?
→ Add RAG (for knowledge grounding)
→ Still not good enough?
→ Fine-tune (for style/format/domain behaviour)
→ Need maximum control?
→ Train from scratch (almost never)
ApproachCostLatencyKnowledge updatesBest for
Prompt engineeringLowNoneInstantStyle, format, simple tasks
RAGMedium+50–200msInstant (update index)Knowledge-intensive Q&A
Fine-tuningHighSlightly lowerRequires retrainingDomain tone, consistent format
Pre-trainingVery highFull controlProprietary domain (rare)

Track 3 — AI Engineering: Production-Grade Systems

3.1 MLOps Fundamentals

MLOps = DevOps for ML models. The goal is to take a model from notebook to production reliably.

The MLOps lifecycle:

Data → Feature engineering → Training → Evaluation
→ Model registry → Deployment → Monitoring → Retraining

Tools to know by category:

CategoryTools
Experiment trackingMLflow, Weights & Biases, Comet
Model registryMLflow Registry, Databricks Model Registry, HuggingFace Hub
Feature storeDatabricks Feature Store, Feast, Tecton
Model servingMLflow serving, FastAPI, BentoML, Ray Serve, Azure ML
MonitoringEvidently, WhyLogs, Arize, Azure Monitor
OrchestrationAirflow, Databricks Workflows, Prefect

The three things that go wrong in production:

  1. Data drift — input distribution shifts (e.g., seasonal patterns change)
  2. Concept drift — the relationship between input and output changes
  3. Model decay — performance degrades over time as the world changes

Interview answer for "How do you monitor an ML model in production?":

I track three layers: data (input distribution — schema, null rates, feature drift using PSI/KL divergence), model (prediction distribution, confidence scores), and business metrics (downstream KPIs). I use Evidently for statistical drift detection and alert when PSI > 0.2. For LLMs, I also track faithfulness and hallucination rate with a lightweight LLM-as-judge setup.

3.2 LLMOps — MLOps for Language Models

LLMs have unique operational concerns beyond traditional ML.

Key LLMOps concerns:

ConcernWhat to trackTool
LatencyTime-to-first-token, total response timeLangSmith, Prometheus
CostTokens in + tokens out per requestLLM provider dashboards
Hallucination rate% responses not grounded in contextLLM-as-judge, RAGAS
Prompt injectionMalicious user inputs hijacking the promptInput validation, firewalls
PII leakageSensitive data in outputsPresidio, regex gates
ThroughputRequests/second under loadLoad testing

Caching strategies for cost reduction:

  • Exact cache: return stored response for identical prompts (Redis)
  • Semantic cache: cache similar prompts using embedding similarity (GPTCache)
  • Prompt compression: reduce input tokens with LLMLingua before sending

3.3 Building with Azure OpenAI (Enterprise Pattern)

Since you're building in M&S's Azure environment, this is critical.

Azure OpenAI vs OpenAI API:

  • Data stays within your Azure tenant (compliance)
  • Private network endpoints
  • Content filtering built-in
  • Microsoft responsible AI layer
  • Same models, different endpoint format

Enterprise architecture pattern:

User request
→ API Gateway (rate limiting, auth)
→ Azure API Management
→ Azure OpenAI Service
→ Content filter (input)
→ Model (GPT-4o / embedding)
→ Content filter (output)
→ Application layer (RAG, agent logic)
→ Audit log → Azure Monitor
→ Response to user

Key services to know:

  • Azure OpenAI — LLM hosting
  • Azure AI Search — vector + hybrid search (replaces Pinecone in Azure)
  • Azure AI Studio — prompt flow, model evaluation
  • Azure AI Content Safety — content moderation API
  • Databricks on Azure — model training, feature engineering, agent hosting

Track 4 — AI Governance: Responsible AI in Practice

This is what separates a junior AI engineer from an AI Tech Lead. Anyone can get a chatbot working. Governing it is the hard part.

4.1 The Governance Framework

Six pillars of responsible AI (Microsoft / industry standard):

PillarWhat it meansHow to implement
FairnessModel doesn't discriminateBias audits, fairness metrics (demographic parity, equal opportunity)
Reliability & SafetyModel behaves correctly under stressRed-teaming, guardrails, fallback behaviour
Privacy & SecurityData is protectedPII masking, data minimisation, access controls
InclusivenessWorks for all usersMultilingual testing, accessibility review
TransparencyUsers know they're interacting with AIAI disclosure, model cards
AccountabilityHumans are in the loop for high-stakes decisionsAudit logs, human review workflows

4.2 Model Cards

A model card is a one-page document describing a model's intended use, limitations, evaluation results, and ethical considerations.

Model card sections:

1. Model description (what it is, who built it, when)
2. Intended use (primary use cases, out-of-scope uses)
3. Training data (source, size, known biases)
4. Evaluation results (benchmarks, failure modes)
5. Ethical considerations (risks, mitigations)
6. Limitations (what it cannot do)
7. Recommendations (when to use, when not to)

Why this matters for interviews:

A model card forces the team to articulate who the model is for, what it can fail at, and who is accountable. It's the difference between "we shipped a model" and "we shipped a governed AI product."

4.3 EU AI Act Awareness

As of August 2024, the EU AI Act is in force. High-risk AI systems (employment, credit, healthcare) face mandatory requirements.

Key risk tiers:

TierExamplesRequirements
UnacceptableSocial scoring, biometric surveillanceBanned
High-riskCV screening, credit scoring, medicalConformity assessment, human oversight, audit logs
Limited riskChatbotsTransparency (must disclose AI)
Minimal riskSpam filtersNo specific requirements

Interview answer for "How do you approach AI governance?":

I start by classifying the use case risk level — is this a recommendation, a decision-support tool, or an autonomous decision? High-stakes decisions (credit, employment, health) need human oversight. I implement audit logging on every inference, PII masking on inputs and outputs, confidence thresholds with graceful fallbacks, and model cards. I also run quarterly bias audits and red-team exercises.

4.4 Practical Governance Implementation

Code patterns you should be able to demonstrate:

PII Masking (before query hits LLM):

import re
PII_PATTERNS = {
"EMAIL": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
"UK_PHONE": r"(\+44|0)[0-9\s\-]{9,13}",
"NI_NUMBER": r"[A-Z]{2}\s?\d{2}\s?\d{2}\s?\d{2}\s?[A-D]",
}
def mask_pii(text: str) -> tuple[str, bool]:
masked = text
found = False
for label, pattern in PII_PATTERNS.items():
if re.search(pattern, masked, re.IGNORECASE):
masked = re.sub(pattern, f"[{label}]", masked, flags=re.IGNORECASE)
found = True
return masked, found

Audit Logging (every inference):

import json, uuid
from datetime import datetime
def audit_log(query: str, response: str, confidence: float, pii_detected: bool):
record = {
"id": str(uuid.uuid4()),
"timestamp": datetime.utcnow().isoformat(),
"query_masked": mask_pii(query)[0], # never log raw query
"response_length": len(response),
"confidence": confidence,
"pii_detected": pii_detected,
"model": "claude-sonnet-4-6",
}
with open("audit.jsonl", "a") as f:
f.write(json.dumps(record) + "\n")

Confidence Scoring (simple heuristic):

UNCERTAINTY_SIGNALS = ["i don't know", "i'm not sure", "unclear", "uncertain", "cannot confirm"]
def score_confidence(response: str, chunks: list[str]) -> float:
score = 1.0
resp_lower = response.lower()
# Penalise uncertainty language
for signal in UNCERTAINTY_SIGNALS:
if signal in resp_lower:
score -= 0.2
# Reward grounding in source material
if chunks:
overlap = sum(1 for chunk in chunks if any(word in resp_lower for word in chunk.lower().split()[:10]))
score = min(1.0, score + (overlap / len(chunks)) * 0.3)
return max(0.0, score)

Track 5 — AI Leadership: Running AI Teams

5.1 The AI Tech Lead Role

An AI Tech Lead is not just a senior engineer who reviews code. They:

  1. Define technical direction — which models, which stack, which governance standards
  2. Translate business problems into AI solutions — stakeholder → product → technical spec
  3. Manage technical risk — hallucination, bias, data privacy, model failure modes
  4. Coach the team — upskilling engineers on LLMs, prompt engineering, MLOps
  5. Own quality gates — evaluation frameworks, not just unit tests
  6. Communicate uncertainty — business stakeholders need confidence intervals, not just "it works"

5.2 The Stakeholder-to-Ship Framework

The pattern that works for translating pain points into AI products:

1. Pain point interview (what problem, who has it, what does failure look like?)
2. Feasibility spike (2-day timebox: can AI solve this? What's the baseline?)
3. Prototype (1 week: working demo, not polished)
4. Stakeholder demo (show failure cases too, not just successes)
5. Governance review (risk tier, PII, audit requirements)
6. Pilot (small user group, monitored closely)
7. Production (with monitoring, fallback, human escalation path)

Mission Desk followed this pattern: Pain = manual mission configuration (7 days). Spike = can Streamlit + Databricks replace the spreadsheet? Prototype = 2-week working app. Result = £500K+ annual value, 85% deployment time reduction.

5.3 Running an AI Squad

Ceremonies for AI teams (adapted from Agile):

CeremonyCadencePurpose
Model reviewWeeklyReview evaluation metrics, flag regressions
ADR (Architecture Decision Record)Per decisionDocument why we chose this approach
Spike timebox2-day maxTime-bounded research on unknowns
Red-team sessionMonthlyTry to break the model — adversarial inputs
Bias auditQuarterlyRun fairness metrics across demographic slices
Stakeholder demoBi-weeklyShow progress, collect feedback, reset expectations

Writing an ADR (Architecture Decision Record):

# ADR-001: Use RAG over fine-tuning for internal FAQ chatbot
## Status: Accepted
## Context
We need an internal FAQ chatbot that stays up-to-date with policy changes.
## Decision
We will use RAG with Azure AI Search over fine-tuning.
## Rationale
- Policy documents change monthly — fine-tuning requires retraining (expensive)
- RAG allows instant knowledge updates by re-indexing documents
- Audit trail: we can show which document chunk grounded the answer
- Cost: no GPU training budget needed
## Trade-offs
- RAG adds 100–200ms latency vs direct LLM call
- Quality depends on document chunking quality
## Consequences
Must maintain document indexing pipeline and monitor retrieval quality

5.4 Communicating AI to Non-Technical Stakeholders

This is the skill that differentiates an AI Tech Lead from an AI engineer.

Translate technical concepts to business language:

TechnicalBusiness language
Hallucination"The model sometimes makes things up — we have confidence scoring and human review for high-stakes outputs"
Context window"The model can only read X pages of information at once — our RAG system handles larger document sets"
Model drift"The model's performance can degrade over time as data patterns change — we monitor this monthly"
Fine-tuning"We teach the model your company's specific language and formats — takes 2–4 weeks"
Embeddings"We convert text into numbers that capture meaning — allows us to find related content without keyword matching"

The three things a stakeholder always wants to know:

  1. Will it be accurate? → Show confidence scores, failure modes, and human escalation path
  2. Will my data be safe? → Explain PII masking, data boundaries, and audit logs
  3. What happens when it's wrong? → Show the fallback, the alert, and the human review process

Track 6 — Interview Playbook for AI Tech Lead Roles

6.1 Technical Questions

Q: Explain RAG to me like I'm a senior engineer.

RAG is a pattern where instead of relying on an LLM's baked-in knowledge, you retrieve relevant documents at query time from a vector database, inject them into the context window, and ask the LLM to answer using only those documents. This grounds the answer in your data, eliminates knowledge cutoff issues, and makes hallucinations auditable — you can show which chunk the answer came from.

Q: How would you design an enterprise chatbot for internal documents?

I'd use a RAG architecture: ingest documents into Azure AI Search with semantic chunking (512 tokens, 64 overlap), use Azure OpenAI embeddings for vectorisation, add a reranker for top-k results, then pipe into GPT-4o with a strict system prompt. Governance layer: PII mask all inputs, JSONL audit log every inference, confidence threshold with "I don't know" fallback below 0.6. Monitoring: track faithfulness weekly with RAGAS, alert on hallucination rate above 5%.

Q: What's the difference between an LLM agent and a simple LLM call?

A single LLM call is stateless — you send a prompt, get a response. An agent has a loop: it can call tools (APIs, databases, code executors), observe the results, reason about what to do next, and iterate until a goal is reached. Agents are better for multi-step tasks but introduce new risks (tool misuse, runaway loops) — you need timeouts, max iterations, and human approval for irreversible actions.

Q: How do you handle hallucinations in production?

Three layers: prevention (RAG grounds the answer in real sources, strict system prompt with "only use the provided context"), detection (confidence scoring, faithfulness check with LLM-as-judge), and mitigation (low-confidence responses trigger a fallback: "I'm not confident enough to answer — here are the relevant documents"). Critical: audit log every response so you can investigate failures.

Q: How do you evaluate an LLM in production?

I separate evaluation into three levels: component (retrieval quality — precision/recall of chunks), end-to-end (RAGAS metrics: faithfulness, answer relevancy, context precision), and business (downstream KPI — e.g., did the agent's recommendation lead to the right action?). I also run regression tests on a golden dataset of 50–100 question/answer pairs before every deployment. For LLM-as-judge, I use a separate, stronger model to score responses.

Q: When would you fine-tune vs use RAG?

RAG for knowledge (factual information that changes or is proprietary). Fine-tuning for behaviour (consistent tone, specific output formats, domain terminology the base model doesn't know well). In practice: start with prompt engineering, add RAG when you need fresh knowledge, fine-tune only if you've proven RAG isn't enough — it's 10x more expensive and complex to maintain.

6.2 Architecture & Design Questions

Q: Design an AI system for retail product recommendations.

I'd build a hybrid system: collaborative filtering (user behaviour) + LLM-enhanced descriptions. The LLM layer enriches product embeddings with semantic understanding — "this jumper is warm, casual, suitable for winter weekends." At inference: retrieve top-20 by collaborative score, rerank using semantic similarity to user's stated intent, apply business rules (stock availability, margin targets), return top-5 with explanation. Governance: log all recommendations, A/B test against baseline, monitor click-through and conversion weekly.

Q: How would you make an AI system compliant with the EU AI Act?

First, classify the risk tier — employment screening and credit scoring are high-risk. For high-risk: document the system with a model card, implement human oversight (no fully autonomous decisions), maintain audit logs for 10 years, conduct conformity assessment, register in the EU AI database. For the technical side: bias testing across protected characteristics, explainability layer (SHAP values or attention visualisation), incident reporting process.

6.3 Leadership & Process Questions

Q: How do you build trust with business stakeholders for AI projects?

I involve them before writing a line of code — the pain point interview is non-negotiable. Then I show failure cases in demos, not just successes. Stakeholders trust you more when you say "here's what it gets wrong and here's how we handle it" than when you only show the good outputs. I set accuracy expectations with ranges, not single numbers. And I deliver something working in 2 weeks rather than promising something perfect in 6 months.

Q: How do you keep an AI team moving fast while maintaining quality?

Two-day spike timeboxes for unknowns — no open-ended research. Golden dataset regression tests before every deploy — if you break something, you know immediately. ADRs for major decisions — stops re-litigating the same choices. Weekly model review — catch drift before stakeholders do. Bi-weekly demos — keeps the team accountable and stakeholders aligned.

Q: How do you communicate AI risk to a non-technical director?

I use a traffic light risk tier system and avoid technical jargon. "This model is a decision-support tool — it surfaces the best options, but a human makes the final call. We've tested it on 500 examples and it's right 94% of the time. When it's wrong, it fails in these specific ways, and here's the escalation path." I always answer: what's the worst case if it goes wrong, and what have we done to prevent it?

Q: Tell me about a time you delivered AI value to the business.

Mission Desk at M&S: business users were manually configuring loyalty missions in spreadsheets — 7 days per deployment, 12% error rate. I built a full-stack platform (Streamlit, PostgreSQL, Databricks) solo using AI-assisted development. Result: deployment time from 7 days to 1 day (85% reduction), error rate from 12% to under 2%, managing 1M+ loyalty records, estimated £500K+ annual business value. Nominated at the Sparks All Hands for company-wide recognition.


Your Learning Sequence (30-Day Sprint)

Week 1 — Foundations
Day 1–2: LLMs, tokenisation, embeddings, attention (Track 1)
Day 3–4: Prompt engineering — CoT, few-shot, structured output
Day 5–7: Build a simple chatbot with Claude API or OpenAI
Week 2 — Applied AI
Day 8–10: RAG — build end-to-end with LangChain + ChromaDB
Day 11–12: Vector databases — compare ChromaDB, pgvector, Azure AI Search
Day 13–14: Agents — ReAct pattern, tool calling, LangGraph basics
Week 3 — Engineering & Governance
Day 15–16: MLOps — MLflow experiment tracking + model registry
Day 17–18: LLMOps — LangSmith tracing, RAGAS evaluation
Day 19–20: Governance — PII masking, audit logging, confidence scoring, model cards
Day 21: EU AI Act — read the official summary, classify your current projects
Week 4 — Leadership & Interview Prep
Day 22–23: Write one ADR for a real decision you've made
Day 24–25: Mock answer all 10 interview questions above out loud
Day 26–27: Build one thing: add a reranker to your RAG POC or an agent tool
Day 28–30: Blog post or LinkedIn post sharing what you learned

Resources

Papers (read the abstracts + introduction, not the full paper):

  • Attention Is All You Need (2017) — the original Transformer paper
  • RAG: Retrieval-Augmented Generation (2020) — Facebook AI
  • ReAct: Synergizing Reasoning and Acting in Language Models (2022)

Courses:

  • DeepLearning.AI Short Courses (free) — LangChain, RAG, agents, fine-tuning
  • Fast.ai Practical Deep Learning — if you want to go deeper on fundamentals
  • Databricks Generative AI Engineer Associate exam prep

Tools to have set up:

  • LangChain + LangSmith (free tier)
  • Anthropic API (Claude) or OpenAI API
  • ChromaDB or pgvector locally
  • MLflow locally (pip install mlflow)

Written by Vijay Anand Pandian — AI Tech Lead & Senior Data Engineer at M&S Sparks. Building governed AI systems that bridge business and engineering.