Building Production-Grade AI Agents: How We Brought Deep Agent Patterns to Pydantic

December 17, 2025

Category Post

Agentic AI AI Agents Architecture Coding AI Open-Source

Table of content

When LangChain published their deep agents blog post documenting patterns from production systems like Claude Code and Manus, we saw something remarkable: the industry was finally formalizing what makes AI agents actually work in the real world. At Vstorm, we had already been building similar patterns for our clients, and we recognized an opportunity to bring these proven architectures into the Pydantic ecosystem.

The result is pydantic-deep – a comprehensive framework for building “deep agents” that can plan, operate on files, delegate tasks, and execute code in isolated environments. Built on top of pydantic-ai, it provides the same capabilities as LangChain’s deepagents, but with the type safety, simplicity, and developer experience that Pydantic users expect.

The problem: why simple agents fall short

Anyone who has deployed an AI agent to production knows the pattern. The demo works beautifully. The proof of concept impresses stakeholders. Then reality hits.

Real-world tasks are not single-step operations. When a user asks an agent to “analyze this CSV file and create a visualization,” the agent needs to:

Plan the approach and break down the task
Read the file from storage
Write analysis code
Execute the code in a safe environment
Handle errors and retry if something fails
Track progress so users know what is happening

Simple agents with a handful of tools cannot handle this complexity reliably. They lose track of multi-step tasks, cannot recover from errors gracefully, and provide no visibility into their reasoning process.

Production agents need architecture patterns that address these challenges systematically.

What are Deep Agents?

Deep agents represent a maturation of AI agent design. The term, popularized by LangChain’s research into production systems, describes agents with specific architectural capabilities:

Planning and Progress Tracking – Deep agents break complex tasks into steps and track their progress. Users can see what the agent is working on, what it has completed, and what remains.
File System Operations – Real work requires reading, writing, and editing files. Deep agents treat the file system as a first-class citizen, with proper abstraction layers that work across in-memory storage, real file systems, and sandboxed containers.
Task Delegation – Some tasks benefit from specialized sub-agents. A coding agent might delegate documentation writing to a specialized sub-agent with different instructions and capabilities.
Sandboxed Execution – Running code that an AI generates is inherently risky. Deep agents execute code in isolated environments, typically Docker containers, preventing accidents from affecting the host system.
Context Management – Long conversations exceed token limits. Deep agents automatically summarize older context while preserving essential information, enabling sessions that span hours or days.
Human-in-the-Loop – Certain operations require human approval before execution. Deep agents support approval workflows for dangerous operations like code execution or file deletion.

These patterns emerged from teams building production agents and discovering what actually works at scale.

Why we built pydantic-deep

When we evaluated existing solutions for our client projects, we found a gap. LangChain’s deepagents provides excellent deep agent capabilities, but it is built on LangGraph, which is a graph-based state machine that adds significant complexity. For teams already invested in the Pydantic ecosystem, switching frameworks was not attractive.

We wanted deep agent capabilities with:

Type safety throughout the entire codebase
Async-first design for modern Python applications
Pydantic models for structured inputs and outputs
Simpler mental models than graph-based state machines
100% test coverage for production confidence

The answer was to build on pydantic-ai, Pydantic’s official AI framework. By extending pydantic-ai with deep agent patterns, we could deliver production-grade capabilities while maintaining the developer experience Pydantic users love.

pydantic-deep: Deep Agents for the Pydantic ecosystem

pydantic-deep provides everything needed to build sophisticated AI agents:

Planning with todo lists

Agents track their work through a todo system that makes reasoning visible:

from pydantic_deep import create_deep_agent, DeepAgentDeps, StateBackend

agent = create_deep_agent(
    model="openai:gpt-4.1",    
    instructions="You are a helpful coding assistant",    
    include_todo=True,    
    include_filesystem=True,
)

deps = DeepAgentDeps(backend=StateBackend())
result = await agent.run(
		"Create a Python script that analyzes sales data",
		 deps=deps
)

When the agent receives this task, it creates a todo list breaking down the work: “Read and understand the data,” “Write analysis script,” “Execute and verify results.” Users see real-time progress as each step moves from pending to in-progress to completed.

Flexible backend architecture

All file operations flow through a backend abstraction. This design enables:

StateBackend for testing with in-memory storage
FilesystemBackend for real file system operations
DockerSandbox for isolated execution environments
CompositeBackend for routing operations to different backends by path

The same agent code works unchanged across all backends. Writes tests against StateBackend, develops locally with FilesystemBackend, deploys to production with DockerSandbox.

Sub-Agent delegation

Complex tasks benefit from specialization. So you can configure sub-agents that the main agent can delegate to:

The main agent recognizes when a task matches a sub-agent’s specialty and delegates appropriately. Sub-agents receive isolated context, they cannot see the parent’s todo list or spawn their own sub-agents, preventing recursive delegation issues.

Skills system

Anthropic’s research on equipping agents for the real world with agent skills demonstrates how skills dramatically improve agent performance on complex tasks. pydantic-deep implements this pattern with skills as reusable instruction sets stored as markdown files with YAML frontmatter. When an agent encounters a task matching a skill’s domain, it loads the relevant instructions.

The data analysis skill, for example, provides templates for loading data with pandas, handling missing values, creating visualizations, and formatting reports. The agent loads these instructions on-demand, getting domain expertise exactly when needed without bloating the base prompt.

Sandboxed code execution

Executing AI-generated code requires isolation. pydantic-deep’s DockerSandbox runs code in containers with:

Pre-configured runtime environments (Python data science, web development, Node.js)
Automatic container lifecycle management
Session isolation for multi-user applications
Idle timeout and cleanup

The SessionManager handles container orchestration for production deployments, creating isolated sandboxes per user and cleaning up idle sessions automatically.

Human-in-the-Loop approval

Some operations should not proceed without human confirmation. So you have the power to configure which tools require approval:

When the agent attempts to execute code, it pauses and requests approval. The calling application presents the command to the user, who can approve, modify, or reject it. Only after approval does execution proceed.

Context management

Long conversations exceed model token limits. The SummarizationProcessor automatically compresses older context while preserving essential information. You can configure triggers based on token count, message count, or context fraction, and specify how much recent context to preserve.

pydantic-deep vs LangChain deepagents

Both libraries implement the same deep agent patterns, but with different architectural philosophies:

/* ============================================ COMPARISON TABLE – BULLETPROOF (INLINE) ============================================ */ .table-comparison{ width:100%; max-width:1000px; margin:0 auto; border:1px solid #FB3640; overflow-x:auto; /* backup na rozjazdy */ box-sizing:border-box; } .table-comparison table{ width:100%; border-collapse:collapse; border-spacing:0; margin:0; font-family:’Source Sans Pro’,sans-serif; } .table-comparison th, .table-comparison td{ box-sizing:border-box; vertical-align:top; } /* Head */ .table-comparison thead th{ font-family:’Montserrat’,sans-serif; font-weight:600; font-size:15px; color:#010D1C; text-align:left; padding:16px 24px; border-bottom:1px solid #FB3640; text-transform:uppercase; letter-spacing:.5px; } /* Rows / cells */ .table-comparison tbody tr{border-bottom:1px solid #FB3640;} .table-comparison tbody tr:last-child{border-bottom:none;} .table-comparison tbody td{ font-family:’Source Sans Pro’,sans-serif; font-weight:400; font-size:17px; line-height:1.4; color:#010D1C; padding:16px 24px; position:relative; } /* Gutenberg-safe (NIE chowamy tekstu) */ .table-comparison td p{margin:0;} .table-comparison td > *:last-child{margin-bottom:0;} /* First column: bold + arrow */ .table-comparison tbody td:first-child{ font-weight:600; padding-right:40px; } .table-comparison tbody td:first-child::after{ content:’→’; position:absolute; right:24px; top:50%; transform:translateY(-50%); color:#FB3640; font-size:16px; font-weight:300; } /* TABLET */ @media (max-width:768px){ .table-comparison thead th{font-size:13px;padding:12px 16px;} .table-comparison tbody td{font-size:15px;padding:12px 16px;} .table-comparison tbody td:first-child::after{right:16px;} } /* MOBILE STACK */ @media (max-width:480px){ .table-comparison table, .table-comparison thead, .table-comparison tbody, .table-comparison th, .table-comparison td, .table-comparison tr{ display:block; width:100%; } .table-comparison thead{display:none;} .table-comparison tbody tr{ border-bottom:1px solid #FB3640; padding-bottom:16px; margin-bottom:16px; } .table-comparison tbody tr:last-child{ border-bottom:none; margin-bottom:0; } .table-comparison tbody td{ border:none; padding:8px 16px; } .table-comparison tbody td:first-child{ padding-top:16px; padding-right:16px; } .table-comparison tbody td:first-child::after{ position:relative; right:auto; top:auto; transform:none; margin-left:8px; display:inline; } /* label tylko jeśli data-label istnieje */ .table-comparison tbody td[data-label]::before{ content:attr(data-label); font-family:’Montserrat’,sans-serif; font-weight:600; font-size:12px; text-transform:uppercase; letter-spacing:.5px; display:block; margin-bottom:4px; color:#010D1C; } .table-comparison tbody td:first-child::before{display:none;} }

Aspect	pydantic-deep	LangChain deepagents
Foundation	pydantic-ai	LangGraph
Architecture	Toolsets and dependencies	Middleware stack with hooks
Type Safety	Pyright strict mode	Standard Python typing
Skills System	Built-in	Not included
Docker Integration	Native DockerSandbox	Via SandboxBackend
Session Management	SessionManager	LangGraph checkpointing
Summarization	Configurable triggers	Auto-configured

Real-World application: the full demo

To demonstrate pydantic-deep’s capabilities in a production-like environment, we built a full example application that showcases every feature working together. You can watch the demo video here to see it in action.

What the demo includes

Multi-User Session Management – Each user receives an isolated Docker container. Sessions persist across page refreshes and clean up automatically after idle timeout.
WebSocket Streaming – Real-time streaming of agent responses, including text generation, thinking content (for reasoning models), tool calls, and tool results.
File Upload and Processing – Users upload CSV, PDF, or text files. The agent accesses these files in its sandbox and can analyze, transform, or reference them.
Custom Tools – Mock GitHub tools demonstrate how to extend pydantic-deep with domain-specific capabilities. The pattern works identically for real API integrations.
Human-in-the-Loop – Code execution requires user approval. The frontend displays the proposed command and waits for confirmation before proceeding.
Skills in Action – A data analysis skill provides the agent with pandas expertise, visualization templates, and best practices for working with CSV data.
Sub-Agent Delegation – A joke generator sub-agent demonstrates task delegation. When users ask for humor, the main agent delegates to the specialized sub-agent.
Todo Progress Tracking – The frontend displays the agent’s todo list in real-time, showing users exactly what the agent is working on.

Architecture highlights

The application demonstrates several production patterns:

Stateless Agent, Stateful Sessions – The agent itself is stateless and shared across all users. Per-user state lives in session objects that hold the Docker sandbox, message history, and todo list.
Backend Injection at Runtime – The agent is configured without a backend. Each session provides its own DockerSandbox, enabling per-user isolation without creating multiple agent instances.
Approval Flow – When the agent calls a tool requiring approval, it returns a DeferredToolRequests object. The application presents this to the user, collects their decision, and resumes the agent with DeferredToolResults.

Getting started

pydantic-deep is available on PyPI:

pip install pydantic-deep

For Docker sandbox support:

pip install pydantic-deep[sandbox]

The documentation covers installation, configuration, and advanced usage patterns. The GitHub repository includes the full example application and comprehensive test suite.

Key Take-aways

Deep agents represent the current state of the art in production AI systems. The patterns; planning, file operations, task delegation, sandboxed execution, context management, and human oversight; emerged from teams solving real problems at scale.

With pydantic-deep, these patterns are now available in the Pydantic ecosystem. Whether you are building a coding assistant, data analysis tool, or any AI application that needs to interact with the world, pydantic-deep provides a solid, type-safe foundation.

The framework reflects Vstorm’s experience building production AI systems for clients across industries. We have seen what works and what fails, and we have encoded those lessons into a library that handles the hard parts so you can focus on your application’s unique value.

<aside> 💡

Ready to build production-grade AI agents?

Meet directly with our founders and PhD AI engineers. We will demonstrate real implementations from 30+ agentic projects and show you the practical steps to integrate them into your specific workflows – no hypotheticals, just proven approaches.

Book your consultation

Resources:

Built with care at Vstorm.co

Last updated: March 9, 2026