AI & Machine Learning

The Distributed Systems Foundation for AI Agents

When LangGraph introduced stateful agents and CrewAI popularized role-based collaboration, they solved the what of multi-agent AI systems. But as organizations move from demos to production, a critical question emerges: how do you run these systems reliably at scale?

Enter Dapr Agents, which reached v1.0 GA in March 2026. Built on the battle-tested Dapr runtime—a CNCF graduated project—this Python framework takes a fundamentally different approach: instead of bolting reliability onto AI frameworks, it brings AI agents to proven distributed systems primitives.

The result? AI agents that inherit decades of distributed systems wisdom: durable execution, exactly-once semantics, automatic retries, and the ability to survive node failures without losing state.

Why Traditional Agent Frameworks Struggle in Production

Most AI agent frameworks were designed for prototyping. They work brilliantly in Jupyter notebooks but encounter friction when deployed to Kubernetes:

State Loss on Restart: LangGraph checkpoints require manual persistence configuration. A pod restart can lose agent memory mid-conversation.
No Native Retry Semantics: When an LLM API returns a 429, most frameworks fail or require custom retry logic.
Coordination Complexity: Multi-agent communication typically requires custom message queues or REST endpoints.
Observability Gaps: Tracing an agent’s reasoning across multiple tool calls often means stitching together fragmented logs.

Dapr Agents addresses each of these by standing on the shoulders of infrastructure patterns that have been production-hardened since the early days of microservices.

Architecture: Agents as Distributed Actors

At its core, Dapr Agents builds on three Dapr building blocks:

1. Workflows for Durable Execution

Every agent interaction—LLM calls, tool invocations, state updates—is persisted as a workflow step. If the agent crashes mid-reasoning, it resumes exactly where it left off:

from dapr_agents import DurableAgent, tool

class ResearchAgent(DurableAgent):
    @tool
    def search_arxiv(self, query: str) -> list:
        return arxiv_client.search(query)
    
    async def research(self, topic: str):
        papers = await self.search_arxiv(topic)
        summary = await self.llm.summarize(papers)
        return summary

Under the hood, Dapr Workflows use the Virtual Actor model—the same pattern that powers Orleans and Akka. Each agent is a stateful actor that can be deactivated when idle and reactivated on demand, enabling thousands of agents to run on a single node.

2. Pub/Sub for Event-Driven Coordination

Multi-agent systems need reliable communication. Dapr’s Pub/Sub abstraction lets agents publish events and subscribe to topics without knowing about the underlying message broker:

from dapr_agents import AgentRunner

await agent_a.publish("research-complete", {
    "topic": "quantum computing",
    "findings": summary
})

@runner.subscribe("research-complete")
async def handle_research(event):
    await writer_agent.draft_article(event["findings"])

Swap Redis for Kafka or RabbitMQ without changing agent code.

3. State Management for Agent Memory

Conversation history, tool results, reasoning traces—all flow through Dapr’s State API with pluggable backends:

from dapr_agents import memory

agent = ResearchAgent(memory=memory.InMemory())

agent = ResearchAgent(
    memory=memory.PostgreSQL(
        connection_string=os.environ["PG_CONN"],
        enable_vector_search=True
    )
)

Agentic Patterns Out of the Box

Dapr Agents ships with implementations of common multi-agent patterns:

Pattern	Description	Use Case
Prompt Chaining	Sequential LLM calls where each output feeds the next	Document processing
Evaluator-Optimizer	One LLM generates, another critiques in a loop	Code review
Parallelization	Fan-out work to multiple agents, aggregate results	Research synthesis
Routing	Classify input and delegate to specialist agents	Customer support
Orchestrator-Workers	Central coordinator delegates subtasks dynamically	Complex workflows

MCP and Cross-Framework Interoperability

A standout feature is native support for the Model Context Protocol (MCP):

from dapr_agents import MCPToolProvider

tools = MCPToolProvider("http://mcp-server:8080")
agent = DurableAgent(tools=[tools])

Dapr Agents can also invoke agents from other frameworks as tools:

from dapr_agents.interop import CrewAITool

research_crew = CrewAITool(crew=research_crew, name="research_team")
coordinator = DurableAgent(tools=[research_crew])

Kubernetes-Native Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: research-agent
  annotations:
    dapr.io/enabled: "true"
    dapr.io/app-id: "research-agent"
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: agent
        image: myregistry/research-agent:v1

Comparison: Dapr Agents vs. LangGraph vs. CrewAI

Capability	Dapr Agents	LangGraph	CrewAI
Durable Execution	Built-in	Requires config	Limited
Auto Retry	Built-in	Manual	Manual
State Persistence	50+ backends	SQLite, PG	In-memory
Kubernetes Native	Sidecar	Manual	Manual
Observability	OpenTelemetry	LangSmith	Limited

When to Choose Dapr Agents

Dapr Agents makes sense when:

You’re already running Dapr for microservices
Your agents must survive node failures without state loss
You need to scale to thousands of concurrent agents
Enterprise observability requirements demand OpenTelemetry

Getting Started

pip install dapr-agents
dapr init

from dapr_agents import DurableAgent, AgentRunner

class GreeterAgent(DurableAgent):
    system_prompt = "You are a helpful assistant."

runner = AgentRunner(agent=GreeterAgent())
runner.start()

The Bigger Picture

Dapr Agents represents a broader trend: AI frameworks are maturing from „make it work“ to „make it work reliably.“ The CNCF ecosystem is converging on this need—KubeCon 2026 showcased kagent, AgentGateway, and the AI Gateway Working Group.

For platform teams, Dapr Agents offers a familiar operational model: sidecars, state stores, message brokers, and observability pipelines. The agents are new; the infrastructure patterns are proven.

Dapr Agents v1.0 is available now at github.com/dapr/dapr-agents.

AI coding tools have revolutionized software development, but there’s a fundamental limitation hiding in plain sight: most AI agents don’t actually understand your codebase—they just search it. When you ask Claude Code, Cursor, or GitHub Copilot to refactor a function, they retrieve relevant file chunks using embedding similarity. But code isn’t a collection of independent text fragments. It’s a graph of interconnected symbols, call hierarchies, and dependencies.

A new generation of tools is changing this paradigm. By parsing repositories into knowledge graphs and exposing them via MCP (Model Context Protocol), projects like Codebase-Memory, CodeGraph, and Lattice give AI agents structural awareness—enabling call-graph traversal, impact analysis, and semantic queries with sub-millisecond latency.

The RAG Problem: Why File-Based Retrieval Falls Short

Traditional RAG (Retrieval-Augmented Generation) pipelines treat codebases as document collections. They chunk files, generate embeddings, and retrieve the most similar fragments when an agent needs context. This approach has critical limitations for code:

Scattered evidence: Function definitions get split across chunks, separating signatures from implementations and losing import context.
Semantic blindness: Vector similarity doesn’t understand call relationships. A function and its callers may embed to distant vectors despite being tightly coupled.
Context window pressure: Complex queries requiring multi-file context quickly exhaust token budgets, forcing truncation of relevant code.
No impact awareness: When modifying a function, RAG can’t tell you which downstream components will break.

The result? AI agents that confidently generate code changes without understanding the ripple effects through your architecture.

Enter Code Knowledge Graphs

Knowledge graphs offer a fundamentally different approach: instead of treating code as text to embed, they parse it into structured relationships. Every function, class, import, and call site becomes a node in a traversable graph. This enables queries that RAG simply cannot answer:

„What functions call processPayment()?“ — Direct graph traversal, not similarity search.
„Show me the impact radius if I change the User interface.“ — Transitive dependency analysis.
„Find all implementations of the Repository pattern.“ — Semantic pattern matching across the codebase.

The key enabler is Tree-Sitter, a parsing library that generates abstract syntax trees (ASTs) for 66+ programming languages. By walking these ASTs, tools can extract symbols, relationships, and structural information without language-specific parsers.

Codebase-Memory: The MCP-Native Approach

Codebase-Memory has emerged as a leading implementation, garnering 900+ GitHub stars since its February 2026 release. It parses repositories with Tree-Sitter and stores the resulting knowledge graph in SQLite, then exposes 14 MCP query tools for AI agents:

Tool	Purpose
`get_symbol`	Retrieve a symbol’s definition, docstring, and location
`get_callers`	Find all functions that call a given symbol
`get_callees`	List all functions called by a symbol
`get_impact_radius`	Transitive analysis of what breaks if a symbol changes
`semantic_search`	Natural language queries over the graph
`get_module_structure`	Hierarchical view of a module’s exports

The performance gains are substantial. Codebase-Memory reports 10x lower token costs compared to file-based retrieval—agents get precisely the context they need without padding prompts with irrelevant code. Query latency runs in sub-milliseconds, even on large repositories.

CodeGraph and token-codegraph: Multi-Language Support

CodeGraph, originally a TypeScript project by Colby McHenry, pioneered the concept of exposing code structure via MCP. Its Rust port, token-codegraph, extends support to Rust, Go, Java, and Scala. Key features include:

libsql storage with FTS5 full-text search for hybrid queries
Incremental syncing for fast re-indexing on file changes
JSON-RPC over stdio for seamless MCP integration
Zero external dependencies—runs entirely locally

The local-first architecture matters for enterprise adoption. Unlike cloud-based code intelligence (Sourcegraph, GitHub Code Search), these tools keep your proprietary code on-premises while still enabling AI-powered navigation.

Lattice: Beyond Syntax to Intent

Lattice takes a different approach by connecting code to its reasoning. Its knowledge graph spans four dimensions:

Research: Background investigation, technical spikes, competitor analysis
Strategy: Architecture decisions, trade-off evaluations, design rationale
Requirements: User stories, acceptance criteria, constraints
Implementation: The actual code and its structural relationships

This enables queries that pure code graphs can’t answer: „Why did we choose PostgreSQL over MongoDB for this service?“ or „What requirements drove the decision to make this component async?“

For AI agents, this context is invaluable. When tasked with extending a feature, they can trace back to the original requirements and strategic decisions rather than guessing from code patterns alone.

Integration Patterns for DevOps Teams

Adopting code knowledge graphs requires integrating them into your existing AI coding workflows:

1. CI/CD Graph Updates

Run graph indexing as part of your pipeline. On each merge to main:

- name: Update Code Knowledge Graph
  run: |
    codebase-memory index --repo . --output graph.db
    codebase-memory serve --port 3001 &

This ensures AI agents always query against the latest codebase structure.

2. MCP Server Configuration

Configure your AI coding tool to connect to the graph server. For Claude Code:

{
  "mcpServers": {
    "codebase": {
      "command": "codebase-memory",
      "args": ["serve", "--db", "./graph.db"]
    }
  }
}

3. Impact Analysis in PR Reviews

Use graph queries to automatically flag high-impact changes:

changed_functions=$(git diff --name-only | xargs codebase-memory changed-symbols)
for fn in $changed_functions; do
  impact=$(codebase-memory get-impact-radius "$fn" --depth 3)
  echo "## Impact Analysis: $fn" >> pr-comment.md
  echo "$impact" >> pr-comment.md
done

Benchmarks: Knowledge Graphs vs. RAG

Recent research validates the knowledge graph approach. On SWE-bench Verified—a benchmark where AI agents resolve real GitHub issues—systems using repository-level graphs significantly outperform pure RAG approaches:

Approach	SWE-bench Score	Token Efficiency
RAG-only retrieval	~45%	Baseline
RepoGraph + RAG hybrid	~62%	3x improvement
Full knowledge graph	~68%	10x improvement

The token efficiency gains compound over time. Agents make fewer exploratory queries when they can directly traverse the call graph, reducing both latency and API costs.

The Future: Hybrid Structural-Semantic Retrieval

The next evolution combines structural graph queries with semantic embeddings. Rather than choosing between „find callers of X“ (structural) and „find code similar to X“ (semantic), hybrid systems enable queries like:

„Find functions that call the payment API and handle similar error patterns to our retry logic.“

This bridges the gap between precise structural navigation and fuzzy semantic understanding—giving AI agents both the map and the intuition to navigate complex codebases.

Conclusion

Code knowledge graphs represent a fundamental shift in how AI agents understand software. By treating repositories as queryable graphs rather than searchable text, tools like Codebase-Memory, CodeGraph, and Lattice unlock capabilities that RAG-based retrieval simply cannot match: call-graph traversal, impact analysis, and sub-millisecond structural queries.

For platform engineering teams, the adoption path is clear: index your repositories, expose the graph via MCP, and integrate impact analysis into your PR workflows. The payoff—10x token efficiency and dramatically more accurate AI assistance—makes this infrastructure investment worthwhile for any team serious about AI-augmented development.

The tools are open source and ready to deploy. The question isn’t whether to adopt code knowledge graphs, but how quickly you can integrate them into your AI coding pipeline.

Kategorie: AI & Machine Learning

Dapr Agents v1.0: Resilient Multi-Agent Orchestration on Kubernetes