Building Production-Grade AI Agents: How We Brought Deep Agent Patterns to Pydantic
December 17, 2025

Category Post
Table of content
When LangChain published their deep agents blog post documenting patterns from production systems like Claude Code and Manus, we saw something remarkable: the industry was finally formalizing what makes AI agents actually work in the real world. At Vstorm, we had already been building similar patterns for our clients, and we recognized an opportunity to bring these proven architectures into the Pydantic ecosystem.
The result is pydantic-deep – a comprehensive framework for building “deep agents” that can plan, operate on files, delegate tasks, and execute code in isolated environments. Built on top of pydantic-ai, it provides the same capabilities as LangChain’s deepagents, but with the type safety, simplicity, and developer experience that Pydantic users expect.
The problem: why simple agents fall short
Anyone who has deployed an AI agent to production knows the pattern. The demo works beautifully. The proof of concept impresses stakeholders. Then reality hits. Real-world tasks are not single-step operations. When a user asks an agent to “analyze this CSV file and create a visualization,” the agent needs to:- Plan the approach and break down the task
- Read the file from storage
- Write analysis code
- Execute the code in a safe environment
- Handle errors and retry if something fails
- Track progress so users know what is happening
What are Deep Agents?
Deep agents represent a maturation of AI agent design. The term, popularized by LangChain’s research into production systems, describes agents with specific architectural capabilities:- Planning and Progress Tracking – Deep agents break complex tasks into steps and track their progress. Users can see what the agent is working on, what it has completed, and what remains.
- File System Operations – Real work requires reading, writing, and editing files. Deep agents treat the file system as a first-class citizen, with proper abstraction layers that work across in-memory storage, real file systems, and sandboxed containers.
- Task Delegation – Some tasks benefit from specialized sub-agents. A coding agent might delegate documentation writing to a specialized sub-agent with different instructions and capabilities.
- Sandboxed Execution – Running code that an AI generates is inherently risky. Deep agents execute code in isolated environments, typically Docker containers, preventing accidents from affecting the host system.
- Context Management – Long conversations exceed token limits. Deep agents automatically summarize older context while preserving essential information, enabling sessions that span hours or days.
- Human-in-the-Loop – Certain operations require human approval before execution. Deep agents support approval workflows for dangerous operations like code execution or file deletion.
Why we built pydantic-deep
When we evaluated existing solutions for our client projects, we found a gap. LangChain’s deepagents provides excellent deep agent capabilities, but it is built on LangGraph, which is a graph-based state machine that adds significant complexity. For teams already invested in the Pydantic ecosystem, switching frameworks was not attractive. We wanted deep agent capabilities with:- Type safety throughout the entire codebase
- Async-first design for modern Python applications
- Pydantic models for structured inputs and outputs
- Simpler mental models than graph-based state machines
- 100% test coverage for production confidence
pydantic-deep: Deep Agents for the Pydantic ecosystem
pydantic-deep provides everything needed to build sophisticated AI agents:Planning with todo lists
Agents track their work through a todo system that makes reasoning visible:from pydantic_deep import create_deep_agent, DeepAgentDeps, StateBackend
agent = create_deep_agent(
model="openai:gpt-4.1",
instructions="You are a helpful coding assistant",
include_todo=True,
include_filesystem=True,
)
deps = DeepAgentDeps(backend=StateBackend())
result = await agent.run(
"Create a Python script that analyzes sales data",
deps=deps
)
When the agent receives this task, it creates a todo list breaking down the work: “Read and understand the data,” “Write analysis script,” “Execute and verify results.” Users see real-time progress as each step moves from pending to in-progress to completed.
Flexible backend architecture
All file operations flow through a backend abstraction. This design enables:- StateBackend for testing with in-memory storage
- FilesystemBackend for real file system operations
- DockerSandbox for isolated execution environments
- CompositeBackend for routing operations to different backends by path
Sub-Agent delegation
Complex tasks benefit from specialization. So you can configure sub-agents that the main agent can delegate to: The main agent recognizes when a task matches a sub-agent’s specialty and delegates appropriately. Sub-agents receive isolated context, they cannot see the parent’s todo list or spawn their own sub-agents, preventing recursive delegation issues.Skills system
Anthropic’s research on equipping agents for the real world with agent skills demonstrates how skills dramatically improve agent performance on complex tasks. pydantic-deep implements this pattern with skills as reusable instruction sets stored as markdown files with YAML frontmatter. When an agent encounters a task matching a skill’s domain, it loads the relevant instructions. The data analysis skill, for example, provides templates for loading data with pandas, handling missing values, creating visualizations, and formatting reports. The agent loads these instructions on-demand, getting domain expertise exactly when needed without bloating the base prompt.Sandboxed code execution
Executing AI-generated code requires isolation. pydantic-deep’s DockerSandbox runs code in containers with:- Pre-configured runtime environments (Python data science, web development, Node.js)
- Automatic container lifecycle management
- Session isolation for multi-user applications
- Idle timeout and cleanup
Human-in-the-Loop approval
Some operations should not proceed without human confirmation. So you have the power to configure which tools require approval: When the agent attempts to execute code, it pauses and requests approval. The calling application presents the command to the user, who can approve, modify, or reject it. Only after approval does execution proceed.Context management
Long conversations exceed model token limits. The SummarizationProcessor automatically compresses older context while preserving essential information. You can configure triggers based on token count, message count, or context fraction, and specify how much recent context to preserve.pydantic-deep vs LangChain deepagents
Both libraries implement the same deep agent patterns, but with different architectural philosophies:
| Aspect | pydantic-deep | LangChain deepagents |
|---|---|---|
| Foundation | pydantic-ai | LangGraph |
| Architecture | Toolsets and dependencies | Middleware stack with hooks |
| Type Safety | Pyright strict mode | Standard Python typing |
| Skills System | Built-in | Not included |
| Docker Integration | Native DockerSandbox | Via SandboxBackend |
| Session Management | SessionManager | LangGraph checkpointing |
| Summarization | Configurable triggers | Auto-configured |
Real-World application: the full demo
To demonstrate pydantic-deep’s capabilities in a production-like environment, we built a full example application that showcases every feature working together. You can watch the demo video here to see it in action.What the demo includes
- Multi-User Session Management – Each user receives an isolated Docker container. Sessions persist across page refreshes and clean up automatically after idle timeout.
- WebSocket Streaming – Real-time streaming of agent responses, including text generation, thinking content (for reasoning models), tool calls, and tool results.
- File Upload and Processing – Users upload CSV, PDF, or text files. The agent accesses these files in its sandbox and can analyze, transform, or reference them.
- Custom Tools – Mock GitHub tools demonstrate how to extend pydantic-deep with domain-specific capabilities. The pattern works identically for real API integrations.
- Human-in-the-Loop – Code execution requires user approval. The frontend displays the proposed command and waits for confirmation before proceeding.
- Skills in Action – A data analysis skill provides the agent with pandas expertise, visualization templates, and best practices for working with CSV data.
- Sub-Agent Delegation – A joke generator sub-agent demonstrates task delegation. When users ask for humor, the main agent delegates to the specialized sub-agent.
- Todo Progress Tracking – The frontend displays the agent’s todo list in real-time, showing users exactly what the agent is working on.
Architecture highlights
The application demonstrates several production patterns:- Stateless Agent, Stateful Sessions – The agent itself is stateless and shared across all users. Per-user state lives in session objects that hold the Docker sandbox, message history, and todo list.
- Backend Injection at Runtime – The agent is configured without a backend. Each session provides its own DockerSandbox, enabling per-user isolation without creating multiple agent instances.
- Approval Flow – When the agent calls a tool requiring approval, it returns a DeferredToolRequests object. The application presents this to the user, collects their decision, and resumes the agent with DeferredToolResults.
Getting started
pydantic-deep is available on PyPI:pip install pydantic-deep
For Docker sandbox support:
pip install pydantic-deep[sandbox]
The documentation covers installation, configuration, and advanced usage patterns. The GitHub repository includes the full example application and comprehensive test suite.
Key Take-aways
Deep agents represent the current state of the art in production AI systems. The patterns; planning, file operations, task delegation, sandboxed execution, context management, and human oversight; emerged from teams solving real problems at scale. With pydantic-deep, these patterns are now available in the Pydantic ecosystem. Whether you are building a coding assistant, data analysis tool, or any AI application that needs to interact with the world, pydantic-deep provides a solid, type-safe foundation. The framework reflects Vstorm’s experience building production AI systems for clients across industries. We have seen what works and what fails, and we have encoded those lessons into a library that handles the hard parts so you can focus on your application’s unique value. <aside> 💡Ready to build production-grade AI agents?
Meet directly with our founders and PhD AI engineers. We will demonstrate real implementations from 30+ agentic projects and show you the practical steps to integrate them into your specific workflows – no hypotheticals, just proven approaches. Book your consultation Resources:- pydantic-deep on GitHub
- Documentation
- PyPI Package
- pydantic-ai
- LangChain deepagents
- LangChain Deep Agents Blog Post
Last updated: December 18, 2025
The LLM Book
The LLM Book explores the world of Artificial Intelligence and Large Language Models, examining their capabilities, technology, and adaptation.



