Dispatch · ai agents

Agent Building Frameworks: The Tools Behind Every Smart Agent

AI agents are in production everywhere. But the framework you pick determines how fast you ship, how much control you have, and whether it survives real workloads. Here's what the numbers say.

Vimal SonagaraVimal SonagaraAI/ML Engineer
May 12, 2026
15 min read
Agent Building Frameworks: The Tools Behind Every Smart Agent
Fig. 01A dispatch on ai agents

TL;DR

  • Built the same multi-agent task on all 6 frameworks: LangGraph, Google ADK, Claude SDK, AutoGen, CrewAI, Semantic Kernel
  • Same LLM (Claude Haiku 4.5 via AWS Bedrock) and same search tool (Serper API) across every framework
  • Ran each benchmark on 3 environments: Local MacBook, AWS EC2 (t3.medium), and GCP VM (e2-medium)
  • 3 parallel researchers + 1 analyst — same task structure, same prompts, same tools
  • Found a real connector bug in Semantic Kernel on AWS Bedrock — documented, root-caused, and fixed

What Is an Agent?

A regular AI answers your question. An AI agent goes further — it breaks down your goal, searches for information, uses tools, and takes actions step by step until the job is done.

What Is an Agent Building Framework?

An agent building framework is a toolkit that provides the structure, tools, and components to build AI agents faster and easier. Instead of building everything from scratch, these frameworks give you pre-built components like LLM integrations, tool management, memory handling, and execution loops — so you can focus on what makes your agent unique.

The Benchmark

To go beyond feature checklists, I built the same multi-agent task across all 6 frameworks:

  • 3 parallel researchers — each assigned one query, searches the web, and summarizes findings:
    • Researcher 1: "Main AI agent frameworks available in 2025 overview comparison"
    • Researcher 2: "Key features capabilities architecture of AI agent frameworks 2025"
    • Researcher 3: "Real world use cases production deployments AI agent frameworks 2025"
  • 1 analyst — synthesizes all 3 research outputs into a final report
  • Same LLM: Claude Haiku 4.5 via AWS Bedrock across every framework
  • Same search tool: Serper API across every framework

Run across 3 environments: Local (MacBook), AWS EC2 (t3.medium, us-east-1), and GCP (e2-medium).


1. LangGraph — by LangChain

LangGraph is a stateful, directed graph-based execution framework for building LLM-driven workflows and agents. It gives you full control to design agent logic by representing workflows as nodes and edges — each node performs a task and edges determine what happens next based on current state.

Supports Python and JavaScript/TypeScript. Works with any LLM: OpenAI, Groq, Anthropic, Gemini, and open-source models.

Key Features

  • Stateful Graphs — each node carries information forward, enabling continuous memory and context across workflow steps
  • Cyclical Graphs — supports workflows where steps repeat, essential for complex agent runtimes
  • Human-in-the-Loop — pause execution at any node and require human approval before proceeding
  • Tool Integration — deep integration with LangChain's tool ecosystem, supporting custom functions and MCP-compatible tools
  • LangSmith Monitoring — built-in observability platform for tracking execution flows, costs, latency, and performance

Pros

  • Maximum control — you define every node, edge, and conditional transition
  • Deep observability — LangSmith provides full visibility into traces, costs, and latency
  • Production-ready — designed for reliable, complex systems at scale

Cons

  • Steeper learning curve — you need to understand graphs, nodes, and state logic
  • More setup required — agents need explicit design of nodes, edges, and transitions
  • Higher effort for simple tasks — overkill if you just need a basic agent

Benchmark Results

LangGraph benchmark across Local, EC2, and GCP

LangGraph is the leanest framework tested — under 1 MB memory across all environments. EC2 total time of 32.93s was the fastest single result in the entire benchmark. Consistent 2.45-2.93x parallel speedup.

LangGraph is a good fit for:

  • Complex workflows that need maximum reliability and control
  • Systems where you want to see exactly what's happening at every step
  • Teams already using LangChain who want seamless integration
  • Human-in-the-loop workflows where the framework handles state and waiting

2. Google ADK — by Google

Google ADK (Agent Development Kit) is a flexible framework for building, managing, evaluating, and deploying AI-powered agents. Supports Python, TypeScript, Java, and Go. Optimized for Gemini models but works with other LLMs through its BaseLLM interface.

Key Features

  • Multi-Agent System Design — build applications with multiple specialized agents that coordinate, delegate, and collaborate
  • Flexible Orchestration — sequential, parallel, or loop agents alongside LLM-driven dynamic routing
  • Rich Tool Ecosystem — custom functions, built-in tools, external APIs, and MCP as both consumer and producer
  • Native Streaming Support — real-time bidirectional streaming for text and audio via Gemini Live API
  • Integrated Developer Tooling — built-in CLI and Developer UI for running, inspecting, and debugging agents locally
  • OpenTelemetry Tracing — emits traces to any OTel-compatible backend, giving you a hierarchical span view of LLM reasoning, tool calls, and external API requests end-to-end

Pros

  • Easy to build multi-agent systems — built-in support for multiple agents working together
  • Great for Google ecosystem — easy to deploy with Google Cloud and Gemini models
  • Real-time streaming — supports live text and audio interactions out of the box

Cons

  • Best with Gemini — most features and optimizations are built around Gemini models
  • Less control than LangGraph — high-level abstractions make deep customization harder
  • Highest memory usage in benchmark — 85-88 MB peak across all environments

Benchmark Results

Google ADK benchmark across Local, EC2, and GCP

Google ADK uses 85-88 MB peak memory — significantly more than every other framework tested. The speedup is estimated at 3.0x because ADK's native ParallelAgent runs all researchers internally and doesn't expose per-agent timing.

ADK is a good fit for:

  • Teams building within the Google ecosystem who want rapid deployment
  • Multi-agent systems that need real-time streaming (text and audio)
  • Projects requiring parallel task execution across multiple specialized agents

3. Claude Agent SDK — by Anthropic

Claude SDK is a Python and TypeScript/Node.js framework that handles the entire autonomous agent execution loop automatically. Works exclusively with Claude models.

Key Features

  • Autonomous Agent Loop — handles the entire execution loop automatically via a single query() call
  • Built-in Production Tools — pre-built tools like Read, Write, Edit, Bash, Glob, and Grep require zero setup
  • Fine-Grained Permission Control — auto-approve specific tools, block others, or require approval for everything
  • Smart Context Management — automatic context compaction and prompt caching to reduce cost and latency
  • Session Flexibility — sessions can be continued, resumed, or forked even across different hosts

Pros

  • Zero setup for coding tasks — built-in tools mean you can start building coding agents immediately
  • Production ready — automatic error handling, session management, and monitoring from day one
  • Cost efficient — automatic prompt caching reduces cost and latency for repeated information

Cons

  • Claude only — works exclusively with Claude models, no support for other LLMs
  • Best for code tasks — less ideal for general-purpose or domain-specific workflows
  • Requires Claude Code CLI on every machine — the SDK is a Python wrapper around the claude CLI process; without the CLI installed, the SDK has nothing to execute. Deploying to EC2 or GCP means installing the CLI there too, not just the Python package

Benchmark Results

Claude SDK benchmark across Local, EC2, and GCP

Claude SDK shows the most consistent parallel efficiency across environments — 2.96x locally, 2.90x on EC2, 2.85x on GCP. Memory footprint is under 1 MB across all environments, matching LangGraph.

Claude SDK is a good fit for:

  • Automated code review agents
  • Bug fixing and code generation tasks
  • Teams already using Claude Code CLI who want SDK access
  • When you need production-ready tools with automatic cost and latency optimization

4. AutoGen — by Microsoft

AutoGen is an open-source framework for building agents that collaborate through conversational patterns to accomplish tasks. Supports Python and .NET. Works with any OpenAI-compatible endpoint — Groq, Azure OpenAI, and local models.

Key Features

  • Three-Layer Architecture — Core (scalable distributed network), AgentChat (conversational AI assistants), Extensions (expandable capabilities)
  • Multiple Predefined Agent Types — User Proxy Agent, Assistant Agent, and Tool/Function Agent with distinct roles
  • Flexible Conversation Patterns — one-to-one, group chat, and hierarchical conversations where agents can delegate tasks
  • AutoGen Studio — visual tool to rapidly prototype multi-agent workflows

Pros

  • Easy multi-agent collaboration — agents work together through conversational patterns
  • No UI needed to test — AutoGen Studio lets you visually prototype workflows without extra code
  • Works with multiple LLMs — compatible with any OpenAI-compatible endpoint

Cons

  • Documentation and community gaps — fewer tutorials and smaller community than established frameworks
  • Smaller community compared to other frameworks — fewer tutorials, StackOverflow answers, and real-world examples outside Microsoft's own docs

Benchmark Results

AutoGen benchmark across Local, EC2, and GCP

AutoGen delivers solid, consistent performance — 40.99s locally, 46.29s on EC2, 53.32s on GCP. Memory at 10.56-10.99 MB is moderate. Parallel speedup of 2.71-2.83x is reliable across environments.

Microsoft Agent Framework (MAF) is the enterprise-ready successor to AutoGen and Semantic Kernel.

AutoGen is a good fit for:

  • Research-style workflows and collaborative problem solving
  • Multi-stage validation pipelines where agents review and critique each other's outputs

5. CrewAI — by CrewAI Inc.

CrewAI empowers developers to build production-ready multi-agent systems by combining collaborative Crews with precise control via Flows. Supports Python only. Works with any model supported via LiteLLM.

Key Features

  • Flows — reliable, stateful workflows for long-running processes and complex logic
  • Autonomous Crews — teams of agents that plan, execute, and collaborate to achieve high-level goals
  • Role-Based Agent Design — agents defined by role, goal, and backstory to guide behavior
  • Enterprise Security — designed with security and compliance in mind

Pros

  • Ease of use — move from idea to execution quickly with role, goal, and backstory customization
  • Great for prototyping — simple enough to get a multi-agent system running fast
  • Customizable agent creation — role-based definitions increase task performance

Cons

  • Limited native tool integrations — fewer ready-made connectors for niche tools
  • Not production-ready at scale — lacks the mature monitoring and debugging tooling needed for large systems

Benchmark Results

CrewAI benchmark across Local, EC2, and GCP

CrewAI is a good fit for:

  • Multi-agent systems where clear role separation and structured task delegation are the primary design requirements
  • Role-based automation workflows like onboarding, scheduling, and multi-step approvals
  • Teams that need rapid prototyping with simple-to-mid-scale agent setups

6. Semantic Kernel — by Microsoft

Semantic Kernel is a framework for building AI agents using reusable plugins and tools. Available in Python, C#, and Java. Works with OpenAI, Azure OpenAI, and custom endpoints, with deep integration into Azure AI services.

Key Features

  • Multiple Agent Types — ChatCompletionAgent, OpenAIAssistantAgent, AzureAIAgent, and custom agents
  • Plugin-Driven Architecture — collections of functions and tools exposed to AI models
  • Native Azure Integration — deep integration with Azure AI services, Monitor, and Application Insights
  • Flexible Invocation Modes — both streaming and non-streaming agent invocation

Pros

  • Deep Azure integration — seamless monitoring via Azure Monitor and Application Insights
  • Flexible plugin architecture — supports native plugins, MCP plugins, and OpenAPI plugins
  • Enterprise-grade observability — best monitoring option for teams already in the Azure ecosystem

Cons

  • Poor parallelism support — limited ability to run multiple agents simultaneously
  • Complex memory management — different agent types use different memory systems
  • AWS Bedrock multi-tool bug — hard crash when LLM batches multiple tool calls (see below)

Benchmark Results

Semantic Kernel benchmark across Local, EC2, and GCP

Semantic Kernel is a good fit for:

  • Organizations already using Azure Monitor and Application Insights for observability
  • Applications requiring deep integration with Microsoft's cloud services

Benchmark Results — All Environments

Local (MacBook)

Full benchmark results — Local MacBook

AWS EC2 (t3.medium, us-east-1)

Full benchmark results — AWS EC2

GCP

Full benchmark results — GCP


What the Numbers Tell You

Lowest memory: LangGraph and Claude SDK both stay under 1 MB peak heap — extremely lean for production workloads with memory constraints.

Highest memory: Google ADK uses 85-88 MB consistently across all environments — more than 100x the memory of LangGraph for the same task.

Most consistent parallel efficiency: Claude SDK achieves 2.85-2.96x speedup across all 3 environments — the tightest range in the benchmark.


Choosing the Right Framework

FactorLangGraphGoogle ADKClaude SDKAutoGenCrewAISemantic Kernel
CreatorLangChainGoogleAnthropicMicrosoftCrewAI Inc.Microsoft
LanguagesPython, JS/TSPython, TS, Java, GoPython, Node.jsPython, .NETPythonPython, C#, Java
LLM SupportAnyGemini-optimized + OpenAI-LLMClaude onlyAny OpenAI-compatibleAny via LiteLLMOpenAI, Azure, custom
Best ForComplex workflows, full controlGoogle ecosystem, streamingCoding agentsResearch, collaborationRole-based multi-agentAzure/Microsoft ecosystem
MonitoringLangSmithOpenTelemetry (OTel)Cost/turn controlsAutoGen StudioLimitedAzure Monitor

There is no single best framework — only the right one for your use case:

  • LangGraph for complex workflows needing full control and maximum observability
  • Google ADK if you're building in the Google ecosystem and need streaming
  • Claude SDK for coding-focused agents with the most consistent parallel performance
  • AutoGen for collaborative, research-style multi-agent workflows
  • CrewAI for role-based systems where clarity of agent responsibilities matters most
  • Semantic Kernel if you're already deep in the Microsoft/Azure world

Agent frameworks are moving fast. The developers experimenting now will be the ones leading the next wave of AI-powered products. Pick your framework, run the same task I ran, and see how it behaves on your workload. The numbers will tell you everything feature lists won't.

Vimal Sonagara

Vimal Sonagara

AI/ML Engineer

Continue reading

All dispatches →
How to make LLMs cheaper without breaking them
llm

How to make LLMs cheaper without breaking them

Most teams overpay for LLM inference by 10-100×. We benchmarked quantization formats on Llama and Gemma models, deployed W4A16 with GKE, and cut costs to $0.50/1M tokens.

Smit ThakoreMay 13 · 10 min