The Complete AI Roadmap for Engineers & Tech Leads

Why This Roadmap Exists

AI is moving fast. Knowing how to run a Jupyter notebook isn't enough anymore. If you want to lead AI teams — or just build production-grade AI systems — you need a mental model that connects the dots between theory, engineering, governance, and leadership.

This roadmap is structured in 6 tracks. Work through them in order, or jump to the track you need most.

Track 1 — AI Foundations: The Mental Models That Matter

Before you build anything, you need to understand how these systems actually work.

1.1 How Large Language Models Work

An LLM is a probability machine. Given a sequence of tokens, it predicts the next most likely token — over and over — until it produces a response.

Key concepts to deeply understand:

Concept	What it means	Why it matters
Tokenisation	Text is split into sub-word tokens (not words)	Affects cost, context length, and non-English performance
Embeddings	Tokens become vectors in high-dimensional space	Foundation for semantic search and RAG
Attention	Each token "attends" to other tokens to understand context	Why LLMs understand long sentences
Context window	Max tokens the model can process at once	Determines how much input you can give it
Temperature	Controls randomness in output	0 = deterministic, 1+ = creative/risky
Top-p / Top-k	Additional sampling controls	Used alongside temperature to shape outputs

The core loop — autoregressive generation:

Input prompt → Tokenise → Embed → Transformer layers (attention + FFN) → Logits → Sample → Token → Repeat

1.2 Transformer Architecture (Enough to Be Dangerous)

You don't need to implement attention from scratch, but you need to explain it in an interview.

Key components:

Self-attention: every token in the input compares itself to every other token, computing a weighted sum of "how relevant is this other token to me?"
Multi-head attention: run attention multiple times in parallel (each "head" learns different patterns)
Feed-forward network: after attention, each token goes through a 2-layer MLP — this is where "knowledge" is stored
Layer norm + residual connections: stabilise training, allow gradients to flow through deep networks
Positional encoding: since attention has no inherent order, position is added explicitly

Interview answer for "How does an LLM know things?":

The factual knowledge is baked into the weights of the feed-forward layers during pre-training on internet-scale text. It doesn't "look things up" — it recalls patterns compressed into billions of parameters.

1.3 Types of AI Models You Need to Know

Model type	Examples	Use case
Foundation / Base	GPT-4, Claude, Llama 3	Starting point for everything
Instruction-tuned	Claude Sonnet, GPT-4o	Follow user instructions well
Embedding models	text-embedding-3, all-MiniLM	Convert text to vectors
Multimodal	GPT-4 Vision, Gemini	Text + image understanding
Code models	DeepSeek Coder, CodeLlama	Code generation
Rerankers	cross-encoders (MS-MARCO)	Re-score RAG retrieval results

1.4 Prompt Engineering — The Practical Layer

This is where most engineers spend 80% of their initial AI time.

Techniques in order of power:

Zero-shot: just ask. Works for simple tasks.
Few-shot: give 2–5 examples before your question. Dramatically improves structured output.
Chain-of-thought (CoT): ask the model to "think step by step". Improves reasoning.
Structured output: ask for JSON/XML. Pair with output parsers in LangChain.
System prompts: set role, tone, constraints. Critical for production systems.
Prompt chaining: break complex tasks into sub-prompts, pipe outputs together.

The prompt anatomy for production:

System: [Role] [Constraints] [Output format]
User: [Context] [Task] [Examples]
Assistant: [Seed if needed]

Track 2 — Applied AI: RAG, Agents, and Fine-Tuning

2.1 Retrieval-Augmented Generation (RAG)

RAG solves the two biggest LLM problems: hallucination and knowledge cutoff.

Instead of relying on the model's baked-in knowledge, you retrieve relevant documents at query time and inject them into the context.

The RAG pipeline:

Query
  → [Optional] Query expansion / rewrite
  → Embed query (embedding model)
  → Vector search (cosine similarity against document chunks)
  → [Optional] Reranker (cross-encoder rescore top-k)
  → Inject top chunks into LLM context
  → LLM generates answer grounded in retrieved context
  → [Optional] Post-processing (confidence scoring, PII masking)
  → Response + source citations

Chunking strategies — this is where RAG lives or dies:

Strategy	When to use
Fixed-size (`chunk_size=512, overlap=64`)	Simple baseline, good starting point
Sentence-aware splitting	Preserves sentence boundaries, better for factual Q&A
Semantic chunking	Splits on topic shifts — best quality, slower
Hierarchical (parent + child chunks)	Retrieve child, return parent for context
Document-level metadata enrichment	Add title/section/date to each chunk

Key RAG metrics to know:

Metric	Measures
Faithfulness	Is the answer grounded in the retrieved context?
Answer relevancy	Does the answer address the question?
Context precision	Are the retrieved chunks actually relevant?
Context recall	Did we retrieve all the chunks needed?

Tools: RAGAS for evaluation, LangSmith for tracing.

Interview answer for "How do you improve RAG quality?":

I start by evaluating each component separately: retrieval quality (precision/recall of chunks), then generation quality (faithfulness, relevancy). Common fixes: better chunking strategy, hybrid search (dense + sparse/BM25), a reranker, and query rewriting. I instrument everything with LangSmith so I can trace failures to the exact step.

2.2 Vector Databases

A vector database stores embeddings and enables fast approximate nearest-neighbour (ANN) search.

Options comparison:

Tool	Best for	Notes
ChromaDB	Local dev, POCs	Easiest to get started
Pinecone	Managed production	Fully hosted, simple API
Weaviate	Hybrid search	Dense + sparse + filters
pgvector	Existing PostgreSQL	Zero new infra if you're already on Postgres
Azure AI Search	Azure-native	Good integration with Azure OpenAI
Databricks Vector Search	Databricks-native	Delta table-backed, auto-sync

What "cosine similarity" means in plain English:

Two vectors are similar if they point in the same direction, regardless of magnitude. A score of 1.0 = identical meaning. 0.0 = unrelated. Used because it captures semantic similarity between embeddings.

2.3 AI Agents

An agent is an LLM that can take actions — calling tools, running code, browsing the web — in a loop until a goal is achieved.

The agent loop (ReAct pattern):

[Thought] I need to find the current price of X
[Action] call_tool("web_search", "current price of X")
[Observation] "X costs £Y as of today"
[Thought] Now I have the data, I can answer
[Final Answer] The price is £Y

Key agent concepts:

Concept	What it is
Tool / Function calling	LLM decides which function to call and with what args
Memory	Short-term (conversation), long-term (vector store), episodic
Planning	Breaking goals into sub-tasks (ReAct, Plan-and-Execute)
Multi-agent	Orchestrator delegates to specialist sub-agents

Frameworks to know:

LangChain Agents — general purpose, most tutorials
LangGraph — stateful, graph-based agent workflows (production-grade)
Databricks AI Agents — notebook-native, integrates with Unity Catalog
AutoGen (Microsoft) — multi-agent conversations
CrewAI — role-based multi-agent teams

Worked example: an audience-selection agent built with a structured tool set — the agent calls audience-filter tools, validates outputs, and surfaces results to business users. This is the shape of a real-world multi-turn agent with domain-specific tooling.

2.4 Fine-Tuning vs RAG vs Prompt Engineering

This is a classic interview question. Know the decision framework:

Start with prompt engineering
  → Still not good enough?
  → Add RAG (for knowledge grounding)
    → Still not good enough?
    → Fine-tune (for style/format/domain behaviour)
      → Need maximum control?
      → Train from scratch (almost never)

Approach	Cost	Latency	Knowledge updates	Best for
Prompt engineering	Low	None	Instant	Style, format, simple tasks
RAG	Medium	+50–200ms	Instant (update index)	Knowledge-intensive Q&A
Fine-tuning	High	Slightly lower	Requires retraining	Domain tone, consistent format
Pre-training	Very high	—	Full control	Proprietary domain (rare)

Track 3 — AI Engineering: Production-Grade Systems

3.1 MLOps Fundamentals

MLOps = DevOps for ML models. The goal is to take a model from notebook to production reliably.

The MLOps lifecycle:

Data → Feature engineering → Training → Evaluation
  → Model registry → Deployment → Monitoring → Retraining

Tools to know by category:

Category	Tools
Experiment tracking	MLflow, Weights & Biases, Comet
Model registry	MLflow Registry, Databricks Model Registry, HuggingFace Hub
Feature store	Databricks Feature Store, Feast, Tecton
Model serving	MLflow serving, FastAPI, BentoML, Ray Serve, Azure ML
Monitoring	Evidently, WhyLogs, Arize, Azure Monitor
Orchestration	Airflow, Databricks Workflows, Prefect

The three things that go wrong in production:

Data drift — input distribution shifts (e.g., seasonal patterns change)
Concept drift — the relationship between input and output changes
Model decay — performance degrades over time as the world changes

Interview answer for "How do you monitor an ML model in production?":

I track three layers: data (input distribution — schema, null rates, feature drift using PSI/KL divergence), model (prediction distribution, confidence scores), and business metrics (downstream KPIs). I use Evidently for statistical drift detection and alert when PSI > 0.2. For LLMs, I also track faithfulness and hallucination rate with a lightweight LLM-as-judge setup.

3.2 LLMOps — MLOps for Language Models

LLMs have unique operational concerns beyond traditional ML.

Key LLMOps concerns:

Concern	What to track	Tool
Latency	Time-to-first-token, total response time	LangSmith, Prometheus
Cost	Tokens in + tokens out per request	LLM provider dashboards
Hallucination rate	% responses not grounded in context	LLM-as-judge, RAGAS
Prompt injection	Malicious user inputs hijacking the prompt	Input validation, firewalls
PII leakage	Sensitive data in outputs	Presidio, regex gates
Throughput	Requests/second under load	Load testing

Caching strategies for cost reduction:

Exact cache: return stored response for identical prompts (Redis)
Semantic cache: cache similar prompts using embedding similarity (GPTCache)
Prompt compression: reduce input tokens with LLMLingua before sending

3.3 Building with Azure OpenAI (Enterprise Pattern)

If you're building in an enterprise Azure environment, this is critical.

Azure OpenAI vs OpenAI API:

Data stays within your Azure tenant (compliance)
Private network endpoints
Content filtering built-in
Microsoft responsible AI layer
Same models, different endpoint format

Enterprise architecture pattern:

User request
  → API Gateway (rate limiting, auth)
  → Azure API Management
  → Azure OpenAI Service
      → Content filter (input)
      → Model (GPT-4o / embedding)
      → Content filter (output)
  → Application layer (RAG, agent logic)
  → Audit log → Azure Monitor
  → Response to user

Key services to know:

Azure OpenAI — LLM hosting
Azure AI Search — vector + hybrid search (replaces Pinecone in Azure)
Azure AI Studio — prompt flow, model evaluation
Azure AI Content Safety — content moderation API
Databricks on Azure — model training, feature engineering, agent hosting

Track 4 — AI Governance: Responsible AI in Practice

This is what separates a junior AI engineer from an AI Tech Lead. Anyone can get a chatbot working. Governing it is the hard part.

4.1 The Governance Framework

Six pillars of responsible AI (Microsoft / industry standard):

Pillar	What it means	How to implement
Fairness	Model doesn't discriminate	Bias audits, fairness metrics (demographic parity, equal opportunity)
Reliability & Safety	Model behaves correctly under stress	Red-teaming, guardrails, fallback behaviour
Privacy & Security	Data is protected	PII masking, data minimisation, access controls
Inclusiveness	Works for all users	Multilingual testing, accessibility review
Transparency	Users know they're interacting with AI	AI disclosure, model cards
Accountability	Humans are in the loop for high-stakes decisions	Audit logs, human review workflows

4.2 Model Cards

A model card is a one-page document describing a model's intended use, limitations, evaluation results, and ethical considerations.

Model card sections:

1. Model description (what it is, who built it, when)
2. Intended use (primary use cases, out-of-scope uses)
3. Training data (source, size, known biases)
4. Evaluation results (benchmarks, failure modes)
5. Ethical considerations (risks, mitigations)
6. Limitations (what it cannot do)
7. Recommendations (when to use, when not to)

Why this matters for interviews:

A model card forces the team to articulate who the model is for, what it can fail at, and who is accountable. It's the difference between "we shipped a model" and "we shipped a governed AI product."

4.3 EU AI Act Awareness

As of August 2024, the EU AI Act is in force. High-risk AI systems (employment, credit, healthcare) face mandatory requirements.

Key risk tiers:

Tier	Examples	Requirements
Unacceptable	Social scoring, biometric surveillance	Banned
High-risk	CV screening, credit scoring, medical	Conformity assessment, human oversight, audit logs
Limited risk	Chatbots	Transparency (must disclose AI)
Minimal risk	Spam filters	No specific requirements

Interview answer for "How do you approach AI governance?":

I start by classifying the use case risk level — is this a recommendation, a decision-support tool, or an autonomous decision? High-stakes decisions (credit, employment, health) need human oversight. I implement audit logging on every inference, PII masking on inputs and outputs, confidence thresholds with graceful fallbacks, and model cards. I also run quarterly bias audits and red-team exercises.

4.4 Practical Governance Implementation

Code patterns you should be able to demonstrate:

PII Masking (before query hits LLM):

import re

PII_PATTERNS = {
    "EMAIL": r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}",
    "UK_PHONE": r"(\+44|0)[0-9\s\-]{9,13}",
    "NI_NUMBER": r"[A-Z]{2}\s?\d{2}\s?\d{2}\s?\d{2}\s?[A-D]",
}

def mask_pii(text: str) -> tuple[str, bool]:
    masked = text
    found = False
    for label, pattern in PII_PATTERNS.items():
        if re.search(pattern, masked, re.IGNORECASE):
            masked = re.sub(pattern, f"[{label}]", masked, flags=re.IGNORECASE)
            found = True
    return masked, found

Audit Logging (every inference):

import json, uuid
from datetime import datetime

def audit_log(query: str, response: str, confidence: float, pii_detected: bool):
    record = {
        "id": str(uuid.uuid4()),
        "timestamp": datetime.utcnow().isoformat(),
        "query_masked": mask_pii(query)[0],  # never log raw query
        "response_length": len(response),
        "confidence": confidence,
        "pii_detected": pii_detected,
        "model": "claude-sonnet-4-6",
    }
    with open("audit.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")

Confidence Scoring (simple heuristic):

UNCERTAINTY_SIGNALS = ["i don't know", "i'm not sure", "unclear", "uncertain", "cannot confirm"]

def score_confidence(response: str, chunks: list[str]) -> float:
    score = 1.0
    resp_lower = response.lower()

    # Penalise uncertainty language
    for signal in UNCERTAINTY_SIGNALS:
        if signal in resp_lower:
            score -= 0.2

    # Reward grounding in source material
    if chunks:
        overlap = sum(1 for chunk in chunks if any(word in resp_lower for word in chunk.lower().split()[:10]))
        score = min(1.0, score + (overlap / len(chunks)) * 0.3)

    return max(0.0, score)

Track 5 — AI Leadership: Running AI Teams

5.1 The AI Tech Lead Role

An AI Tech Lead is not just a senior engineer who reviews code. They:

Define technical direction — which models, which stack, which governance standards
Translate business problems into AI solutions — stakeholder → product → technical spec
Manage technical risk — hallucination, bias, data privacy, model failure modes
Coach the team — upskilling engineers on LLMs, prompt engineering, MLOps
Own quality gates — evaluation frameworks, not just unit tests
Communicate uncertainty — business stakeholders need confidence intervals, not just "it works"

5.2 The Stakeholder-to-Ship Framework

The pattern that works for translating pain points into AI products:

1. Pain point interview (what problem, who has it, what does failure look like?)
2. Feasibility spike (2-day timebox: can AI solve this? What's the baseline?)
3. Prototype (1 week: working demo, not polished)
4. Stakeholder demo (show failure cases too, not just successes)
5. Governance review (risk tier, PII, audit requirements)
6. Pilot (small user group, monitored closely)
7. Production (with monitoring, fallback, human escalation path)

Mission Desk followed this pattern: Pain = manual mission configuration (7 days). Spike = can Streamlit + Databricks replace the spreadsheet? Prototype = 2-week working app. Result = £500K+ annual value, 85% deployment time reduction.

5.3 Running an AI Squad

Ceremonies for AI teams (adapted from Agile):

Ceremony	Cadence	Purpose
Model review	Weekly	Review evaluation metrics, flag regressions
ADR (Architecture Decision Record)	Per decision	Document why we chose this approach
Spike timebox	2-day max	Time-bounded research on unknowns
Red-team session	Monthly	Try to break the model — adversarial inputs
Bias audit	Quarterly	Run fairness metrics across demographic slices
Stakeholder demo	Bi-weekly	Show progress, collect feedback, reset expectations

Writing an ADR (Architecture Decision Record):

# ADR-001: Use RAG over fine-tuning for internal FAQ chatbot

## Status: Accepted

## Context
We need an internal FAQ chatbot that stays up-to-date with policy changes.

## Decision
We will use RAG with Azure AI Search over fine-tuning.

## Rationale
- Policy documents change monthly — fine-tuning requires retraining (expensive)
- RAG allows instant knowledge updates by re-indexing documents
- Audit trail: we can show which document chunk grounded the answer
- Cost: no GPU training budget needed

## Trade-offs
- RAG adds 100–200ms latency vs direct LLM call
- Quality depends on document chunking quality

## Consequences
Must maintain document indexing pipeline and monitor retrieval quality

5.4 Communicating AI to Non-Technical Stakeholders

This is the skill that differentiates an AI Tech Lead from an AI engineer.

Translate technical concepts to business language:

Technical	Business language
Hallucination	"The model sometimes makes things up — we have confidence scoring and human review for high-stakes outputs"
Context window	"The model can only read X pages of information at once — our RAG system handles larger document sets"
Model drift	"The model's performance can degrade over time as data patterns change — we monitor this monthly"
Fine-tuning	"We teach the model your company's specific language and formats — takes 2–4 weeks"
Embeddings	"We convert text into numbers that capture meaning — allows us to find related content without keyword matching"

The three things a stakeholder always wants to know:

Will it be accurate? → Show confidence scores, failure modes, and human escalation path
Will my data be safe? → Explain PII masking, data boundaries, and audit logs
What happens when it's wrong? → Show the fallback, the alert, and the human review process

Track 6 — Interview Playbook for AI Tech Lead Roles

6.1 Technical Questions

Q: Explain RAG to me like I'm a senior engineer.

RAG is a pattern where instead of relying on an LLM's baked-in knowledge, you retrieve relevant documents at query time from a vector database, inject them into the context window, and ask the LLM to answer using only those documents. This grounds the answer in your data, eliminates knowledge cutoff issues, and makes hallucinations auditable — you can show which chunk the answer came from.

Q: How would you design an enterprise chatbot for internal documents?

I'd use a RAG architecture: ingest documents into Azure AI Search with semantic chunking (512 tokens, 64 overlap), use Azure OpenAI embeddings for vectorisation, add a reranker for top-k results, then pipe into GPT-4o with a strict system prompt. Governance layer: PII mask all inputs, JSONL audit log every inference, confidence threshold with "I don't know" fallback below 0.6. Monitoring: track faithfulness weekly with RAGAS, alert on hallucination rate above 5%.

Q: What's the difference between an LLM agent and a simple LLM call?

A single LLM call is stateless — you send a prompt, get a response. An agent has a loop: it can call tools (APIs, databases, code executors), observe the results, reason about what to do next, and iterate until a goal is reached. Agents are better for multi-step tasks but introduce new risks (tool misuse, runaway loops) — you need timeouts, max iterations, and human approval for irreversible actions.

Q: How do you handle hallucinations in production?

Three layers: prevention (RAG grounds the answer in real sources, strict system prompt with "only use the provided context"), detection (confidence scoring, faithfulness check with LLM-as-judge), and mitigation (low-confidence responses trigger a fallback: "I'm not confident enough to answer — here are the relevant documents"). Critical: audit log every response so you can investigate failures.

Q: How do you evaluate an LLM in production?

I separate evaluation into three levels: component (retrieval quality — precision/recall of chunks), end-to-end (RAGAS metrics: faithfulness, answer relevancy, context precision), and business (downstream KPI — e.g., did the agent's recommendation lead to the right action?). I also run regression tests on a golden dataset of 50–100 question/answer pairs before every deployment. For LLM-as-judge, I use a separate, stronger model to score responses.

Q: When would you fine-tune vs use RAG?

RAG for knowledge (factual information that changes or is proprietary). Fine-tuning for behaviour (consistent tone, specific output formats, domain terminology the base model doesn't know well). In practice: start with prompt engineering, add RAG when you need fresh knowledge, fine-tune only if you've proven RAG isn't enough — it's 10x more expensive and complex to maintain.

6.2 Architecture & Design Questions

Q: Design an AI system for retail product recommendations.

I'd build a hybrid system: collaborative filtering (user behaviour) + LLM-enhanced descriptions. The LLM layer enriches product embeddings with semantic understanding — "this jumper is warm, casual, suitable for winter weekends." At inference: retrieve top-20 by collaborative score, rerank using semantic similarity to user's stated intent, apply business rules (stock availability, margin targets), return top-5 with explanation. Governance: log all recommendations, A/B test against baseline, monitor click-through and conversion weekly.

Q: How would you make an AI system compliant with the EU AI Act?

First, classify the risk tier — employment screening and credit scoring are high-risk. For high-risk: document the system with a model card, implement human oversight (no fully autonomous decisions), maintain audit logs for 10 years, conduct conformity assessment, register in the EU AI database. For the technical side: bias testing across protected characteristics, explainability layer (SHAP values or attention visualisation), incident reporting process.

6.3 Leadership & Process Questions

Q: How do you build trust with business stakeholders for AI projects?

I involve them before writing a line of code — the pain point interview is non-negotiable. Then I show failure cases in demos, not just successes. Stakeholders trust you more when you say "here's what it gets wrong and here's how we handle it" than when you only show the good outputs. I set accuracy expectations with ranges, not single numbers. And I deliver something working in 2 weeks rather than promising something perfect in 6 months.

Q: How do you keep an AI team moving fast while maintaining quality?

Two-day spike timeboxes for unknowns — no open-ended research. Golden dataset regression tests before every deploy — if you break something, you know immediately. ADRs for major decisions — stops re-litigating the same choices. Weekly model review — catch drift before stakeholders do. Bi-weekly demos — keeps the team accountable and stakeholders aligned.

Q: How do you communicate AI risk to a non-technical director?

I use a traffic light risk tier system and avoid technical jargon. "This model is a decision-support tool — it surfaces the best options, but a human makes the final call. We've tested it on 500 examples and it's right 94% of the time. When it's wrong, it fails in these specific ways, and here's the escalation path." I always answer: what's the worst case if it goes wrong, and what have we done to prevent it?

Q: Tell me about a time you delivered AI value to the business.

Mission Desk at M&S: business users were manually configuring loyalty missions in spreadsheets — 7 days per deployment, 12% error rate. I built a full-stack platform (Streamlit, PostgreSQL, Databricks) solo using AI-assisted development. Result: deployment time from 7 days to 1 day (85% reduction), error rate from 12% to under 2%, managing 1M+ loyalty records, estimated £500K+ annual business value. Nominated at the Sparks All Hands for company-wide recognition.

Your Learning Sequence (30-Day Sprint)

Week 1 — Foundations
  Day 1–2:  LLMs, tokenisation, embeddings, attention (Track 1)
  Day 3–4:  Prompt engineering — CoT, few-shot, structured output
  Day 5–7:  Build a simple chatbot with Claude API or OpenAI

Week 2 — Applied AI
  Day 8–10: RAG — build end-to-end with LangChain + ChromaDB
  Day 11–12: Vector databases — compare ChromaDB, pgvector, Azure AI Search
  Day 13–14: Agents — ReAct pattern, tool calling, LangGraph basics

Week 3 — Engineering & Governance
  Day 15–16: MLOps — MLflow experiment tracking + model registry
  Day 17–18: LLMOps — LangSmith tracing, RAGAS evaluation
  Day 19–20: Governance — PII masking, audit logging, confidence scoring, model cards
  Day 21:    EU AI Act — read the official summary, classify your current projects

Week 4 — Leadership & Interview Prep
  Day 22–23: Write one ADR for a real decision you've made
  Day 24–25: Mock answer all 10 interview questions above out loud
  Day 26–27: Build one thing: add a reranker to your RAG POC or an agent tool
  Day 28–30: Blog post or LinkedIn post sharing what you learned

Resources

Papers (read the abstracts + introduction, not the full paper):

Attention Is All You Need (2017) — the original Transformer paper
RAG: Retrieval-Augmented Generation (2020) — Facebook AI
ReAct: Synergizing Reasoning and Acting in Language Models (2022)

Courses:

DeepLearning.AI Short Courses (free) — LangChain, RAG, agents, fine-tuning
Fast.ai Practical Deep Learning — if you want to go deeper on fundamentals
Databricks Generative AI Engineer Associate exam prep

Tools to have set up:

LangChain + LangSmith (free tier)
Anthropic API (Claude) or OpenAI API
ChromaDB or pgvector locally
MLflow locally (pip install mlflow)

Written by Vijay Anand Pandian — Senior Data Engineer at M&S Sparks. Building governed AI systems that bridge business and engineering.