From Data Engineer to AI Engineer — Part 5: AI Agents — Making AI Take Actions

Published on
-
10 mins read
Authors

Series: From Data/Software Engineer to AI Engineer Part 5 of 7 — ← Part 4: RAG


The Limitation of RAG

RAG answers questions. It is brilliant at "what does our policy say about X?"

But what if the question is: "Check our inventory system, find products below reorder threshold, and draft purchase orders for the three most critical ones"?

That requires:

  1. Calling an inventory API
  2. Analysing the results
  3. Making a decision (which are most critical?)
  4. Generating documents

A single LLM call cannot do this. You need an agent.


What an Agent Actually Is

An agent is just an LLM in a loop.

Goal: "Find products below reorder threshold and draft purchase orders"
Loop:
[Think] What do I need to do first?
→ I need to check the inventory system
[Act] call_tool("get_inventory", {"threshold": "reorder"})
[Observe] {"items": [{"sku": "A123", "stock": 5, "reorder_at": 10}, ...]}
[Think] I have the inventory data. Which are most critical?
→ Sort by (reorder_at - stock) descending: SKU A123, B456, C789
[Act] call_tool("get_supplier_info", {"skus": ["A123", "B456", "C789"]})
[Observe] {"A123": {"supplier": "Acme", "lead_time": "3 days"}, ...}
[Think] Now I have all information to draft the orders
[Act] draft_purchase_orders(...)
[Observe] "3 purchase orders created"
[Think] Goal is complete
[Final Answer] "I've identified 3 products below reorder threshold and created purchase orders..."

This Think → Act → Observe loop is called ReAct (Reasoning + Acting). It is the foundation of every production agent.


Tool Calling: The Key Mechanism

Tools are functions that the agent can call. The LLM decides:

  • Which tool to call
  • With what arguments
  • Whether to call another tool based on the result
import anthropic
import json
client = anthropic.Anthropic()
# ── Define your tools ─────────────────────────────────────
# Claude uses a JSON schema to understand what each tool does and expects
tools = [
{
"name": "get_stock_level",
"description": "Get the current stock level for a product SKU from the inventory system",
"input_schema": {
"type": "object",
"properties": {
"sku": {
"type": "string",
"description": "The product SKU code, e.g. 'MNS-SHIRT-001'"
}
},
"required": ["sku"]
}
},
{
"name": "search_products",
"description": "Search for products by name or category",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"category": {"type": "string", "description": "Optional category filter"}
},
"required": ["query"]
}
}
]
# ── Implement your tools ──────────────────────────────────
# These are just normal Python functions — the agent calls them via JSON
def get_stock_level(sku: str) -> dict:
# In production: call your actual inventory API
fake_inventory = {
"MNS-SHIRT-001": {"stock": 45, "reorder_at": 20, "location": "Warehouse A"},
"MNS-JEANS-002": {"stock": 8, "reorder_at": 25, "location": "Warehouse B"},
}
return fake_inventory.get(sku, {"error": f"SKU {sku} not found"})
def search_products(query: str, category: str = None) -> list:
# In production: query your product database
return [
{"sku": "MNS-SHIRT-001", "name": "Oxford Cotton Shirt", "category": "menswear"},
{"sku": "MNS-JEANS-002", "name": "Slim Fit Jeans", "category": "menswear"},
]
# Tool dispatcher — maps tool names to functions
TOOL_MAP = {
"get_stock_level": get_stock_level,
"search_products": search_products,
}
def execute_tool(tool_name: str, tool_input: dict) -> str:
if tool_name not in TOOL_MAP:
return json.dumps({"error": f"Unknown tool: {tool_name}"})
result = TOOL_MAP[tool_name](**tool_input)
return json.dumps(result)

Building the Agent Loop

def run_agent(user_message: str, max_iterations: int = 10) -> str:
"""
The core agent loop: Think → Act → Observe → repeat until done.
max_iterations prevents infinite loops.
"""
messages = [{"role": "user", "content": user_message}]
for iteration in range(max_iterations):
print(f"\n── Iteration {iteration + 1} ──")
# ── Think: ask the LLM what to do next ──────────
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=tools,
messages=messages,
system="""
You are a helpful retail operations assistant.
Use the available tools to answer questions accurately.
Only use information from tool results — do not guess.
"""
)
print(f"Stop reason: {response.stop_reason}")
# ── Check if agent is done ────────────────────────
if response.stop_reason == "end_turn":
# Agent has finished — return its final answer
final_text = next(
block.text for block in response.content
if hasattr(block, "text")
)
print(f"Final answer: {final_text}")
return final_text
# ── Act: agent wants to use tools ─────────────────
if response.stop_reason == "tool_use":
# Add assistant's response (including tool calls) to history
messages.append({"role": "assistant", "content": response.content})
# Execute each tool the agent called
tool_results = []
for block in response.content:
if block.type == "tool_use":
print(f"Calling tool: {block.name}({block.input})")
# ── Observe: run the tool and capture result ──
result = execute_tool(block.name, block.input)
print(f"Tool result: {result}")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result
})
# Feed tool results back to the agent
messages.append({"role": "user", "content": tool_results})
return "Agent reached maximum iterations without completing the task."
# ── Run it ────────────────────────────────────────────────
result = run_agent("What is the stock level for MNS-JEANS-002 and is it below reorder threshold?")

Output:

── Iteration 1 ──
Stop reason: tool_use
Calling tool: get_stock_level({'sku': 'MNS-JEANS-002'})
Tool result: {"stock": 8, "reorder_at": 25, "location": "Warehouse B"}
── Iteration 2 ──
Stop reason: end_turn
Final answer: MNS-JEANS-002 (Slim Fit Jeans) currently has 8 units in stock at
Warehouse B. This is below the reorder threshold of 25 units — it should be
reordered as soon as possible.

Real-World Tool Patterns

Pattern 1: API Call Tool

import httpx
def get_weather(city: str, country_code: str = "GB") -> dict:
"""Call a real weather API."""
response = httpx.get(
f"https://api.openweathermap.org/data/2.5/weather",
params={"q": f"{city},{country_code}", "appid": "YOUR_KEY", "units": "metric"}
)
data = response.json()
return {
"city": city,
"temperature": data["main"]["temp"],
"description": data["weather"][0]["description"]
}

Pattern 2: Database Query Tool

import psycopg2
def query_database(sql: str) -> list[dict]:
"""
Allow the agent to query a database.
CRITICAL: Never pass raw user input as SQL — always use parameterised queries
or restrict to a whitelist of allowed queries.
"""
# Safety: only allow SELECT statements
if not sql.strip().upper().startswith("SELECT"):
return [{"error": "Only SELECT queries are permitted"}]
conn = psycopg2.connect("your_connection_string")
cursor = conn.cursor()
cursor.execute(sql)
columns = [desc[0] for desc in cursor.description]
rows = cursor.fetchall()
return [dict(zip(columns, row)) for row in rows]

Pattern 3: RAG Tool (Agent + RAG Combined)

def search_knowledge_base(question: str) -> str:
"""
Allow the agent to query your RAG system as a tool.
This combines agents with RAG — the agent decides when to search.
"""
result = rag_query(question) # from Part 4
return f"Answer: {result['answer']}\nSources: {', '.join(result['sources'])}"

Pattern 4: Code Execution Tool

import subprocess
def run_python(code: str) -> str:
"""
Allow the agent to run Python code.
DANGER: Sandbox this properly in production — never run arbitrary code unsandboxed.
"""
result = subprocess.run(
["python", "-c", code],
capture_output=True,
text=True,
timeout=10 # prevent runaway execution
)
return result.stdout or result.stderr

Multi-Agent Systems

For complex tasks, you split work across specialist agents coordinated by an orchestrator agent.

# Orchestrator: decides which specialist to use
# Specialist agents: each has specific tools and expertise
orchestrator_system = """
You are an operations orchestrator. Based on the user's request,
determine which specialist to route to:
- "inventory_agent": for stock levels, orders, warehouse questions
- "customer_agent": for customer accounts, orders, complaints
- "analytics_agent": for reports, trends, performance data
Respond with JSON: {"route_to": "agent_name", "task": "refined task description"}
"""
def route_request(user_request: str) -> str:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=100,
system=orchestrator_system,
messages=[{"role": "user", "content": user_request}]
)
routing = json.loads(response.content[0].text)
# Route to the appropriate specialist agent
specialist = routing["route_to"]
task = routing["task"]
if specialist == "inventory_agent":
return run_agent(task) # with inventory tools
elif specialist == "customer_agent":
return run_customer_agent(task) # with customer tools
# ...

This is the pattern behind Sparky AI — an orchestrator understands the business user's intent (who should receive this promotion?) and routes to specialist agents that query audience data, apply business rules, and validate outputs.


Guardrails: The Non-Negotiable

Agents are powerful. That power requires guardrails.

class SafeAgent:
def __init__(self, max_iterations=10, max_cost_usd=0.50):
self.max_iterations = max_iterations
self.max_cost_usd = max_cost_usd
self.total_tokens = 0
def run(self, user_message: str) -> str:
# Input validation
if len(user_message) > 2000:
return "Request too long. Please be more concise."
# Dangerous action detection
dangerous_keywords = ["delete all", "drop table", "rm -rf", "format"]
if any(kw in user_message.lower() for kw in dangerous_keywords):
return "This request requires human approval before proceeding."
return self._run_loop(user_message)
def _check_budget(self, input_tokens: int, output_tokens: int):
cost = (input_tokens * 0.000003) + (output_tokens * 0.000015)
self.total_cost += cost
if self.total_cost > self.max_cost_usd:
raise Exception(f"Cost limit exceeded: ${self.total_cost:.4f}")
# Require human approval for irreversible actions
IRREVERSIBLE_TOOLS = {"delete_order", "cancel_shipment", "send_email"}
def execute_tool_safe(tool_name: str, tool_input: dict) -> str:
if tool_name in IRREVERSIBLE_TOOLS:
# In production: send approval request to Slack/email
print(f"⚠️ APPROVAL REQUIRED: {tool_name}({tool_input})")
approval = input("Approve? (yes/no): ")
if approval.lower() != "yes":
return json.dumps({"status": "cancelled", "reason": "user declined"})
return execute_tool(tool_name, tool_input)

When to Use Agents (and When Not To)

This is the judgement call that separates senior AI engineers.

Use an agent when:

  • The task requires multiple steps with decision-making between them
  • You need to call external APIs or databases
  • The path to the answer is not known in advance
  • Different inputs require different tool sequences

Do NOT use an agent when:

  • A single LLM call or RAG query is sufficient (YAGNI)
  • The task is deterministic — write a Python function instead
  • Latency matters — agents are 3–10x slower than single calls
  • The action is irreversible and high-stakes — agents can make mistakes
Simple question → Single LLM call (Part 3)
Knowledge question → RAG (Part 4)
Multi-step action → Agent (Part 5)

The most common mistake is reaching for agents when a structured prompt chain (Part 3) would do the job with 10x less complexity and cost.


Frameworks at a Glance

FrameworkBest forMaturity
Raw API loop (what we built)Learning, maximum controlStable
LangChain AgentsGeneral purpose, huge ecosystemStable
LangGraphComplex stateful workflowsGrowing
Databricks AI AgentsDatabricks-native, Unity CatalogDatabricks
AutoGenMulti-agent conversationsMicrosoft
CrewAIRole-based agent teamsGrowing

Start with the raw loop. Once you understand it, any framework is just an abstraction on top of what you already know.


Summary

ConceptWhat it is
AgentLLM + tools + loop until goal achieved
ToolA Python function the agent can call
ReActThink → Act → Observe loop pattern
Multi-agentOrchestrator + specialist agents
GuardrailsBudget limits, input validation, human approval

The agent pattern unlocks the full power of AI for operational tasks. But with that power comes responsibility: every tool call can have real-world consequences. Build in guardrails from day one.


Next: Part 6 — Production AI: From Notebook to Live System

In Part 6, we take your AI system into production — deployment, monitoring, drift detection, and the operational patterns that keep it reliable.