Build an AI Agent from Scratch with Python (No Framework Needed) |...

The word "agent" is everywhere in AI right now — but most explanations either wave their hands or throw you into a 400-line LangGraph workflow. Neither helps you actually understand what an agent is. So let's skip the abstractions. By the end of this post you'll have a working Python agent — one that reasons, picks tools, runs them, and loops until it has a final answer — built entirely from scratch.

No LangChain. No LlamaIndex. No magic. Just the OpenAI SDK, the inspect module, and roughly 80 lines of Python.

What Makes Something an "Agent"?

An LLM is a fancy text-completion function. You give it tokens, it gives you tokens. An agent is that same LLM wrapped in a loop that can observe its environment, make decisions, take actions, and remember what it has already done. Four properties define an agent:

👀

Perceive

Reads the current state — user query, tool results, conversation history, any external context provided to it.

🧠

Plan

Reasons about what action to take next: call a tool, ask a clarifying question, or return the final answer.

⚡

Act

Executes the chosen action — calls a Python function, hits an API, runs code, or queries a database.

💾

Remember

Maintains state across steps — the full conversation history so every iteration builds on the last.

💡

The Loop is the Agent

The difference between an LLM and an agent is the loop. A raw LLM call is a single shot: in → out. An agent keeps calling the LLM, running tools, and feeding results back until the task is genuinely complete. That loop is the whole secret.

The ReAct pattern (Reason + Act) formalises this: at each step the model produces a thought (silent reasoning), an action (tool call), and an observation (tool result) — repeating until it emits a final answer. OpenAI's native tool-calling API implements exactly this pattern, just with structured JSON instead of free-text scratchpads.

Step 1: Define Tools

Tools are plain Python functions. The only constraint is that they must have type-annotated parameters and a docstring — we'll use those to auto-generate the JSON schema the model expects.

Python
# tools.py  — three plain functions, zero framework
import json, math, urllib.request, urllib.parse

def get_weather(city: str) -> str:
    """Return the current weather for a given city.

    Args:
        city: The name of the city, e.g. 'London' or 'New York'.
    """
    # In production, call a real weather API here.
    # For the demo we return canned data.
    weather_db = {
        "london":    "12°C, overcast",
        "new york":  "24°C, sunny",
        "tokyo":     "18°C, partly cloudy",
        "dubai":     "38°C, clear",
    }
    key = city.lower().strip()
    return weather_db.get(key, f"Weather data not available for {city}.")


def calculate(expression: str) -> str:
    """Evaluate a safe mathematical expression and return the result.

    Args:
        expression: A mathematical expression string, e.g. '2 ** 10' or 'sqrt(144)'.
    """
    # Expose only safe math symbols — never use bare eval() in production!
    allowed = {k: getattr(math, k) for k in dir(math) if not k.startswith("_")}
    allowed["__builtins__"] = {}
    try:
        result = eval(expression, allowed)  # noqa: S307
        return str(result)
    except Exception as e:
        return f"Error evaluating expression: {e}"


def search_web(query: str) -> str:
    """Search the web for up-to-date information on a topic.

    Args:
        query: The search query string, e.g. 'latest GPT-4o benchmarks'.
    """
    # Stub — swap in SerpAPI / Brave Search API / Tavily in production.
    encoded = urllib.parse.quote_plus(query)
    return (
        f"[Simulated search results for: {query}]\n"
        "1. Researchers at MIT publish new benchmark showing 40% improvement...\n"
        "2. Industry report: adoption of agentic AI up 3x in 2025...\n"
        "3. OpenAI releases o3-mini with improved reasoning at lower cost..."
    )


# Collect all tools in one list for easy import
ALL_TOOLS = [get_weather, calculate, search_web]

Now let's auto-generate the JSON schema OpenAI needs. Instead of writing it by hand (error-prone and tedious), we'll introspect each function with the inspect module:

Python
# schema.py  — build OpenAI tool schemas from function signatures
import inspect, re
from typing import Callable, get_type_hints

# Map Python types → JSON Schema types
PYTHON_TO_JSON = {str: "string", int: "integer", float: "number", bool: "boolean"}


def _parse_arg_docs(docstring: str) -> dict:
    """Extract per-argument descriptions from a Google-style docstring."""
    arg_docs = {}
    if not docstring:
        return arg_docs
    # Match lines under the 'Args:' section
    args_section = re.search(r"Args:\n(.*?)(?:\n\n|\Z)", docstring, re.DOTALL)
    if not args_section:
        return arg_docs
    for match in re.finditer(r"\s{8}(\w+):\s(.+)", args_section.group(1)):
        arg_docs[match.group(1)] = match.group(2).strip()
    return arg_docs


def function_to_schema(func: Callable) -> dict:
    """Convert a Python function into an OpenAI tool schema dict."""
    hints    = get_type_hints(func)
    sig      = inspect.signature(func)
    doc      = inspect.getdoc(func) or ""
    arg_docs = _parse_arg_docs(doc)

    # First paragraph of the docstring = tool description
    description = doc.split("\n\n")[0].strip()

    properties, required = {}, []
    for name, param in sig.parameters.items():
        json_type = PYTHON_TO_JSON.get(hints.get(name), "string")
        properties[name] = {
            "type":        json_type,
            "description": arg_docs.get(name, ""),
        }
        if param.default is inspect.Parameter.empty:
            required.append(name)

    return {
        "type": "function",
        "function": {
            "name":        func.__name__,
            "description": description,
            "parameters": {
                "type":                 "object",
                "properties":          properties,
                "required":            required,
                "additionalProperties": False,
            },
        },
    }


def build_tool_schemas(functions: list) -> list:
    """Build a list of OpenAI-compatible tool schemas from a list of functions."""
    return [function_to_schema(f) for f in functions]

💡

Why auto-generate schemas?

Hand-writing JSON schemas for every function is tedious, error-prone, and means you have two things to keep in sync. Generating from the signature means adding a new tool is just writing a typed Python function — no schema boilerplate needed.

Step 2: Implement Tool Calling

OpenAI's tool-calling flow has three steps: send the tools list with the user message, receive a response that may contain tool_calls, dispatch each call to the matching Python function, then send the results back. Here's the dispatch logic in isolation:

Python
# dispatcher.py  — call the right function given a tool_call object
import json
from tools import ALL_TOOLS

# Build a name → callable registry
TOOL_REGISTRY: dict = {fn.__name__: fn for fn in ALL_TOOLS}


def dispatch_tool_call(tool_call) -> str:
    """Execute a single OpenAI tool_call and return the string result."""
    name      = tool_call.function.name
    arguments = json.loads(tool_call.function.arguments)

    if name not in TOOL_REGISTRY:
        return f"Error: unknown tool '{name}'"

    fn = TOOL_REGISTRY[name]
    try:
        result = fn(**arguments)
        return str(result)
    except TypeError as e:
        return f"Error calling {name}: {e}"


def handle_tool_calls(response_message, messages: list) -> list:
    """Process all tool_calls in a response; append results to messages."""
    # 1. Append the assistant's message (which contains the tool_call requests)
    messages.append(response_message)

    # 2. Execute each tool call and append the result
    for tc in response_message.tool_calls:
        result = dispatch_tool_call(tc)
        messages.append({
            "role":         "tool",
            "tool_call_id": tc.id,
            "content":      result,
        })

    return messages

Notice that the tool result message must include the original tool_call_id — this lets the model match each result back to its specific request when multiple tools are called in parallel.

Step 3: The Agent Loop

The loop is the heart of the agent. It's simpler than most people expect — essentially a while True with a safety exit:

Python
# agent.py  — the core agent loop (~50 lines)
import os
from openai import OpenAI
from tools import ALL_TOOLS
from schema import build_tool_schemas
from dispatcher import handle_tool_calls

client     = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
TOOL_SCHEMAS = build_tool_schemas(ALL_TOOLS)

SYSTEM_PROMPT = """You are a research assistant with access to tools.
Think step-by-step. Use tools whenever you need real data.
When you have enough information, write a clear final answer."""


def run_agent(user_query: str, max_iterations: int = 10) -> str:
    """Run the agent loop until a final answer is produced."""
    messages = [
        {"role": "system",  "content": SYSTEM_PROMPT},
        {"role": "user",    "content": user_query},
    ]

    for iteration in range(max_iterations):
        print(f"\n── Iteration {iteration + 1} ──")

        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_SCHEMAS,
            tool_choice="auto",   # let the model decide
        )

        msg        = response.choices[0].message
        stop_reason = response.choices[0].finish_reason

        # ── Case 1: the model chose to call one or more tools ──
        if stop_reason == "tool_calls":
            for tc in msg.tool_calls:
                args = json.loads(tc.function.arguments)
                print(f"  🔧 {tc.function.name}({args})")
            messages = handle_tool_calls(msg, messages)
            continue  # loop back with tool results appended

        # ── Case 2: the model is done — return the final answer ──
        if stop_reason == "stop":
            return msg.content

        # ── Case 3: unexpected finish reason ──
        return f"Unexpected stop reason: {stop_reason}"

    return "Max iterations reached — partial answer in message history."


if __name__ == "__main__":
    import json
    answer = run_agent(
        "What's the weather in London and Tokyo? "
        "Also, what is 2 to the power of 16?"
    )
    print("\n── Final Answer ──\n")
    print(answer)

⚠️

Always set max_iterations

Without a guard, a bug in your tool (e.g. one that always returns an error) can put the agent in an infinite loop and burn your API credits. max_iterations=10 is a safe default for most tasks; raise it for complex multi-step research.

When you run this, you'll see the model call get_weather("London") and get_weather("Tokyo") in the same iteration (parallel tool calling — GPT-4o supports this natively), then in the next iteration call calculate("2 ** 16"), and finally emit a clean prose answer with all three results incorporated.

Step 4: Adding Memory

The messages list is the agent's short-term memory. Every tool call, result, and assistant thought is appended — the model sees the full history at every step. But context windows are finite, so you need a strategy for long-running agents.

Python
# memory.py  — two strategies for managing context length
from openai import OpenAI

client = OpenAI()


# ── Strategy 1: Sliding window — keep last N messages ──────────────────────
def trim_messages_sliding(messages: list, keep_last: int = 20) -> list:
    """Keep the system prompt + the most recent `keep_last` messages."""
    system_msgs = [m for m in messages if m["role"] == "system"]
    other_msgs  = [m for m in messages if m["role"] != "system"]
    return system_msgs + other_msgs[-keep_last:]


# ── Strategy 2: Summarisation — compress old turns into one message ─────────
def trim_messages_summarise(messages: list, keep_recent: int = 6) -> list:
    """Summarise everything except the last `keep_recent` messages."""
    system_msgs = [m for m in messages if m["role"] == "system"]
    other_msgs  = [m for m in messages if m["role"] != "system"]

    if len(other_msgs) <= keep_recent:
        return messages  # nothing to trim

    old_msgs    = other_msgs[:-keep_recent]
    recent_msgs = other_msgs[-keep_recent:]

    # Ask the model to compress the old conversation
    summary_prompt = [
        {"role": "system",  "content": "Summarise the following conversation history in 3-5 bullet points. Preserve key facts and decisions."},
        {"role": "user",    "content": "\n".join(f"{m['role']}: {m.get('content', '')}" for m in old_msgs)},
    ]
    summary_resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=summary_prompt,
    )
    summary_text = summary_resp.choices[0].message.content

    summary_msg = {"role": "assistant", "content": f"[Earlier conversation summary]\n{summary_text}"}
    return system_msgs + [summary_msg] + recent_msgs


# ── Long-term memory (sketch) ────────────────────────────────────────────────
# For persistent memory across sessions, store key facts in a vector DB.
# At the start of each session: retrieve relevant facts → inject into system prompt.
# Tools like Mem0 or a simple Chroma/Pinecone store work well here.

Here's a comparison of memory approaches to help you pick the right one:

Strategy	Best For	Pros	Cons
Full history	Short tasks (< 20 turns)	Zero overhead, perfect recall	Hits context limit fast
Sliding window	Conversational agents	Simple, predictable token usage	Loses early context entirely
Summarisation	Long research sessions	Preserves key facts	Extra API call cost; summary may lose detail
Vector DB (long-term)	Persistent user profiles	Survives restarts, scales infinitely	Infrastructure overhead, retrieval noise

📌

Short-term vs Long-term Memory

Short-term memory is the messages list — it lives inside a single agent run and is lost when the process ends. Long-term memory requires an external store (a database, a vector index) that persists between runs. Most agents only need short-term; add long-term when users expect the agent to remember them across sessions.

Complete Working Example

Let's put everything together into a "Research Agent" that can search the web, do calculations, check weather, and then write a polished final report. This is the complete, runnable file:

Python
#!/usr/bin/env python3
"""
research_agent.py — A complete AI agent in ~80 lines.
No framework. No magic. Just the OpenAI API.

Requirements:
    pip install openai python-dotenv
    export OPENAI_API_KEY=sk-...
"""
import inspect, json, math, os, re, urllib.parse
from typing import Callable
from openai import OpenAI

client = OpenAI()

# ── 1. Define tools ──────────────────────────────────────────────────────────

def get_weather(city: str) -> str:
    """Return current weather for a city.
    Args:
        city: City name, e.g. 'Paris'.
    """
    db = {"london": "12°C overcast", "new york": "24°C sunny", "tokyo": "18°C cloudy"}
    return db.get(city.lower(), f"No data for {city}")


def calculate(expression: str) -> str:
    """Evaluate a mathematical expression safely.
    Args:
        expression: Math expression, e.g. 'sqrt(2) * 100'.
    """
    ns = {k: getattr(math, k) for k in dir(math) if not k.startswith("_")}
    ns["__builtins__"] = {}
    try:
        return str(eval(expression, ns))  # noqa: S307
    except Exception as e:
        return f"Error: {e}"


def search_web(query: str) -> str:
    """Search the web for up-to-date information.
    Args:
        query: Search query string.
    """
    # Replace with Tavily / Brave / SerpAPI in production
    return (
        f"[Results for: {query}]\n"
        "• Study: agentic AI frameworks reduce task completion time by 60%\n"
        "• Report: Python remains #1 language for AI development in 2025\n"
        "• OpenAI o3 sets new records on SWE-bench (71.7% pass rate)"
    )


ALL_TOOLS = [get_weather, calculate, search_web]

# ── 2. Auto-generate schemas ─────────────────────────────────────────────────

PYTHON_TO_JSON = {str: "string", int: "integer", float: "number", bool: "boolean"}


def fn_to_schema(fn: Callable) -> dict:
    sig  = inspect.signature(fn)
    doc  = inspect.getdoc(fn) or ""
    desc = doc.split("\n\n")[0].strip()

    arg_desc = {}
    for m in re.finditer(r"\s{8}(\w+):\s(.+)", doc):
        arg_desc[m.group(1)] = m.group(2)

    props, req = {}, []
    from typing import get_type_hints
    hints = get_type_hints(fn)
    for name, param in sig.parameters.items():
        props[name] = {"type": PYTHON_TO_JSON.get(hints.get(name), "string"), "description": arg_desc.get(name, "")}
        if param.default is inspect.Parameter.empty:
            req.append(name)

    return {"type": "function", "function": {"name": fn.__name__, "description": desc, "parameters": {"type": "object", "properties": props, "required": req}}}


SCHEMAS   = [fn_to_schema(f) for f in ALL_TOOLS]
REGISTRY  = {fn.__name__: fn for fn in ALL_TOOLS}

# ── 3. Tool dispatcher ───────────────────────────────────────────────────────

def run_tool(tc) -> dict:
    name = tc.function.name
    args = json.loads(tc.function.arguments)
    try:
        result = REGISTRY[name](**args)
    except Exception as e:
        result = f"Error: {e}"
    return {"role": "tool", "tool_call_id": tc.id, "content": str(result)}

# ── 4. The agent loop ────────────────────────────────────────────────────────

SYSTEM = """You are a research assistant. Use tools to gather real data.
When you have enough information, write a well-structured final report."""


def run_agent(query: str, max_iter: int = 8) -> str:
    messages = [{"role": "system", "content": SYSTEM}, {"role": "user", "content": query}]

    for i in range(max_iter):
        resp   = client.chat.completions.create(model="gpt-4o", messages=messages, tools=SCHEMAS, tool_choice="auto")
        msg    = resp.choices[0].message
        reason = resp.choices[0].finish_reason

        if reason == "tool_calls":
            messages.append(msg)
            for tc in msg.tool_calls:
                print(f"  🔧 {tc.function.name}({json.loads(tc.function.arguments)})")
                messages.append(run_tool(tc))
        elif reason == "stop":
            return msg.content
        else:
            break

    return "Agent stopped — max iterations reached."

# ── 5. Run it ────────────────────────────────────────────────────────────────

if __name__ == "__main__":
    report = run_agent(
        "Research report: Compare the weather in London vs Tokyo today, "
        "calculate what percentage warmer Tokyo is, and summarise recent "
        "trends in agentic AI development. Write a final 3-paragraph report."
    )
    print("\n═══ FINAL REPORT ═══\n")
    print(report)

🚀

Swap the stub tools for real APIs

The search_web function above returns fake data. To make this agent genuinely useful, replace it with a Tavily or SerpAPI call — both offer Python SDKs and free tiers. get_weather maps cleanly to the OpenWeatherMap API. The agent loop itself doesn't change at all.

A sample run against GPT-4o produces roughly this execution trace:

Output
  🔧 get_weather({'city': 'London'})
  🔧 get_weather({'city': 'Tokyo'})

  🔧 calculate({'expression': '(18 - 12) / 12 * 100'})

  🔧 search_web({'query': 'agentic AI development trends 2025'})

═══ FINAL REPORT ═══

**Weather Comparison: London vs Tokyo**
London currently sits at 12°C with overcast skies, while Tokyo is warmer at 18°C
under partly cloudy conditions — making Tokyo 50% warmer than London today.

**Agentic AI Trends**
Recent research confirms that agentic frameworks reduce task completion time by up
to 60% compared to single-shot LLM prompting. Python continues to dominate AI
development, powering the majority of agent implementations in 2025. OpenAI's o3
model has pushed the frontier further, achieving 71.7% on SWE-bench...

**Conclusion**
...

Three tool calls, three observations, one loop-back — and the model synthesises a coherent multi-source report. The total token cost for this run is roughly 800 input tokens plus whatever the response length is.

When to Add a Framework

Building from scratch is the best way to truly understand agents. But frameworks exist for good reasons — once you're past the learning stage, they save real engineering time.

Scenario	Scratch	Framework (LangGraph / CrewAI)
Single agent, 2–5 tools	✅ Ideal	Overkill
Multi-agent orchestration	Hard to scale	✅ LangGraph
Complex branching state	Re-inventing the wheel	✅ LangGraph
Role-based agent teams	Lots of glue code	✅ CrewAI / AutoGen
Production tracing / monitoring	Build your own logger	✅ LangSmith / Langfuse
Custom memory + retrieval	✅ Full control	Often leaky abstractions

🗺️

The right mental model

Think of frameworks as pre-built plumbing for common agent patterns. If your use case fits the pattern, use the framework. If it doesn't, the framework fights you. The code in this post is what every framework is doing under the hood — knowing it cold means you're never confused by what any framework is doing.

Rule of thumb: if you have more than one agent, non-trivial state transitions, or you need production-grade observability — reach for LangGraph. If you have a single agent with a handful of tools and a simple loop, the approach in this post is cleaner, faster to iterate on, and has fewer dependencies to break.

AI AgentPythonTool CallingOpenAIAgentic AILLMReActFrom Scratch

← BackPortfolio Home Next up →LangChain Tutorial for Beginners

Back to Portfolio