Agent Building Frameworks: The Tools Behind Every Smart Agent

TL;DR

Built the same multi-agent task on all 6 frameworks: LangGraph, Google ADK, Claude SDK, AutoGen, CrewAI, Semantic Kernel
Same LLM (Claude Haiku 4.5 via AWS Bedrock) and same search tool (Serper API) across every framework
Ran each benchmark on 3 environments: Local MacBook, AWS EC2 (t3.medium), and GCP VM (e2-medium)
3 parallel researchers + 1 analyst — same task structure, same prompts, same tools
Found a real connector bug in Semantic Kernel on AWS Bedrock — documented, root-caused, and fixed

What Is an Agent?

A regular AI answers your question. An AI agent goes further — it breaks down your goal, searches for information, uses tools, and takes actions step by step until the job is done.

What Is an Agent Building Framework?

An agent building framework is a toolkit that provides the structure, tools, and components to build AI agents faster and easier. Instead of building everything from scratch, these frameworks give you pre-built components like LLM integrations, tool management, memory handling, and execution loops — so you can focus on what makes your agent unique.

The Benchmark

To go beyond feature checklists, I built the same multi-agent task across all 6 frameworks:

3 parallel researchers — each assigned one query, searches the web, and summarizes findings:
- Researcher 1: "Main AI agent frameworks available in 2025 overview comparison"
- Researcher 2: "Key features capabilities architecture of AI agent frameworks 2025"
- Researcher 3: "Real world use cases production deployments AI agent frameworks 2025"
1 analyst — synthesizes all 3 research outputs into a final report
Same LLM: Claude Haiku 4.5 via AWS Bedrock across every framework
Same search tool: Serper API across every framework

Run across 3 environments: Local (MacBook), AWS EC2 (t3.medium, us-east-1), and GCP (e2-medium).

1. LangGraph — by LangChain

LangGraph is a stateful, directed graph-based execution framework for building LLM-driven workflows and agents. It gives you full control to design agent logic by representing workflows as nodes and edges — each node performs a task and edges determine what happens next based on current state.

Supports Python and JavaScript/TypeScript. Works with any LLM: OpenAI, Groq, Anthropic, Gemini, and open-source models.

Key Features

Stateful Graphs — each node carries information forward, enabling continuous memory and context across workflow steps
Cyclical Graphs — supports workflows where steps repeat, essential for complex agent runtimes
Human-in-the-Loop — pause execution at any node and require human approval before proceeding
Tool Integration — deep integration with LangChain's tool ecosystem, supporting custom functions and MCP-compatible tools
LangSmith Monitoring — built-in observability platform for tracking execution flows, costs, latency, and performance

Pros

Maximum control — you define every node, edge, and conditional transition
Deep observability — LangSmith provides full visibility into traces, costs, and latency
Production-ready — designed for reliable, complex systems at scale

Cons

Steeper learning curve — you need to understand graphs, nodes, and state logic
More setup required — agents need explicit design of nodes, edges, and transitions
Higher effort for simple tasks — overkill if you just need a basic agent

Benchmark Results

LangGraph benchmark across Local, EC2, and GCP

LangGraph is the leanest framework tested — under 1 MB memory across all environments. EC2 total time of 32.93s was the fastest single result in the entire benchmark. Consistent 2.45-2.93x parallel speedup.

LangGraph is a good fit for:

Complex workflows that need maximum reliability and control
Systems where you want to see exactly what's happening at every step
Teams already using LangChain who want seamless integration
Human-in-the-loop workflows where the framework handles state and waiting

2. Google ADK — by Google

Google ADK (Agent Development Kit) is a flexible framework for building, managing, evaluating, and deploying AI-powered agents. Supports Python, TypeScript, Java, and Go. Optimized for Gemini models but works with other LLMs through its BaseLLM interface.

Key Features

Multi-Agent System Design — build applications with multiple specialized agents that coordinate, delegate, and collaborate
Flexible Orchestration — sequential, parallel, or loop agents alongside LLM-driven dynamic routing
Rich Tool Ecosystem — custom functions, built-in tools, external APIs, and MCP as both consumer and producer
Native Streaming Support — real-time bidirectional streaming for text and audio via Gemini Live API
Integrated Developer Tooling — built-in CLI and Developer UI for running, inspecting, and debugging agents locally
OpenTelemetry Tracing — emits traces to any OTel-compatible backend, giving you a hierarchical span view of LLM reasoning, tool calls, and external API requests end-to-end

Pros

Easy to build multi-agent systems — built-in support for multiple agents working together
Great for Google ecosystem — easy to deploy with Google Cloud and Gemini models
Real-time streaming — supports live text and audio interactions out of the box

Cons

Best with Gemini — most features and optimizations are built around Gemini models
Less control than LangGraph — high-level abstractions make deep customization harder
Highest memory usage in benchmark — 85-88 MB peak across all environments

Benchmark Results

Google ADK benchmark across Local, EC2, and GCP

Google ADK uses 85-88 MB peak memory — significantly more than every other framework tested. The speedup is estimated at 3.0x because ADK's native ParallelAgent runs all researchers internally and doesn't expose per-agent timing.

ADK is a good fit for:

Teams building within the Google ecosystem who want rapid deployment
Multi-agent systems that need real-time streaming (text and audio)
Projects requiring parallel task execution across multiple specialized agents

3. Claude Agent SDK — by Anthropic

Claude SDK is a Python and TypeScript/Node.js framework that handles the entire autonomous agent execution loop automatically. Works exclusively with Claude models.

Key Features

Autonomous Agent Loop — handles the entire execution loop automatically via a single query() call
Built-in Production Tools — pre-built tools like Read, Write, Edit, Bash, Glob, and Grep require zero setup
Fine-Grained Permission Control — auto-approve specific tools, block others, or require approval for everything
Smart Context Management — automatic context compaction and prompt caching to reduce cost and latency
Session Flexibility — sessions can be continued, resumed, or forked even across different hosts

Pros

Zero setup for coding tasks — built-in tools mean you can start building coding agents immediately
Production ready — automatic error handling, session management, and monitoring from day one
Cost efficient — automatic prompt caching reduces cost and latency for repeated information

Cons

Claude only — works exclusively with Claude models, no support for other LLMs
Best for code tasks — less ideal for general-purpose or domain-specific workflows
Requires Claude Code CLI on every machine — the SDK is a Python wrapper around the claude CLI process; without the CLI installed, the SDK has nothing to execute. Deploying to EC2 or GCP means installing the CLI there too, not just the Python package

Benchmark Results

Claude SDK benchmark across Local, EC2, and GCP

Claude SDK shows the most consistent parallel efficiency across environments — 2.96x locally, 2.90x on EC2, 2.85x on GCP. Memory footprint is under 1 MB across all environments, matching LangGraph.

Claude SDK is a good fit for:

Automated code review agents
Bug fixing and code generation tasks
Teams already using Claude Code CLI who want SDK access
When you need production-ready tools with automatic cost and latency optimization

4. AutoGen — by Microsoft

AutoGen is an open-source framework for building agents that collaborate through conversational patterns to accomplish tasks. Supports Python and .NET. Works with any OpenAI-compatible endpoint — Groq, Azure OpenAI, and local models.

Key Features

Three-Layer Architecture — Core (scalable distributed network), AgentChat (conversational AI assistants), Extensions (expandable capabilities)
Multiple Predefined Agent Types — User Proxy Agent, Assistant Agent, and Tool/Function Agent with distinct roles
Flexible Conversation Patterns — one-to-one, group chat, and hierarchical conversations where agents can delegate tasks
AutoGen Studio — visual tool to rapidly prototype multi-agent workflows

Pros

Easy multi-agent collaboration — agents work together through conversational patterns
No UI needed to test — AutoGen Studio lets you visually prototype workflows without extra code
Works with multiple LLMs — compatible with any OpenAI-compatible endpoint

Cons

Documentation and community gaps — fewer tutorials and smaller community than established frameworks
Smaller community compared to other frameworks — fewer tutorials, StackOverflow answers, and real-world examples outside Microsoft's own docs

Benchmark Results

AutoGen benchmark across Local, EC2, and GCP

AutoGen delivers solid, consistent performance — 40.99s locally, 46.29s on EC2, 53.32s on GCP. Memory at 10.56-10.99 MB is moderate. Parallel speedup of 2.71-2.83x is reliable across environments.

Microsoft Agent Framework (MAF) is the enterprise-ready successor to AutoGen and Semantic Kernel.

AutoGen is a good fit for:

Research-style workflows and collaborative problem solving
Multi-stage validation pipelines where agents review and critique each other's outputs

5. CrewAI — by CrewAI Inc.

CrewAI empowers developers to build production-ready multi-agent systems by combining collaborative Crews with precise control via Flows. Supports Python only. Works with any model supported via LiteLLM.

Key Features

Flows — reliable, stateful workflows for long-running processes and complex logic
Autonomous Crews — teams of agents that plan, execute, and collaborate to achieve high-level goals
Role-Based Agent Design — agents defined by role, goal, and backstory to guide behavior
Enterprise Security — designed with security and compliance in mind

Pros

Ease of use — move from idea to execution quickly with role, goal, and backstory customization
Great for prototyping — simple enough to get a multi-agent system running fast
Customizable agent creation — role-based definitions increase task performance

Cons

Limited native tool integrations — fewer ready-made connectors for niche tools
Not production-ready at scale — lacks the mature monitoring and debugging tooling needed for large systems

Benchmark Results

CrewAI benchmark across Local, EC2, and GCP

CrewAI is a good fit for:

Multi-agent systems where clear role separation and structured task delegation are the primary design requirements
Role-based automation workflows like onboarding, scheduling, and multi-step approvals
Teams that need rapid prototyping with simple-to-mid-scale agent setups

6. Semantic Kernel — by Microsoft

Semantic Kernel is a framework for building AI agents using reusable plugins and tools. Available in Python, C#, and Java. Works with OpenAI, Azure OpenAI, and custom endpoints, with deep integration into Azure AI services.

Key Features

Multiple Agent Types — ChatCompletionAgent, OpenAIAssistantAgent, AzureAIAgent, and custom agents
Plugin-Driven Architecture — collections of functions and tools exposed to AI models
Native Azure Integration — deep integration with Azure AI services, Monitor, and Application Insights
Flexible Invocation Modes — both streaming and non-streaming agent invocation

Pros

Deep Azure integration — seamless monitoring via Azure Monitor and Application Insights
Flexible plugin architecture — supports native plugins, MCP plugins, and OpenAPI plugins
Enterprise-grade observability — best monitoring option for teams already in the Azure ecosystem

Cons

Poor parallelism support — limited ability to run multiple agents simultaneously
Complex memory management — different agent types use different memory systems
AWS Bedrock multi-tool bug — hard crash when LLM batches multiple tool calls (see below)

Benchmark Results

Semantic Kernel benchmark across Local, EC2, and GCP

Semantic Kernel is a good fit for:

Organizations already using Azure Monitor and Application Insights for observability
Applications requiring deep integration with Microsoft's cloud services

Benchmark Results — All Environments

Local (MacBook)

Full benchmark results — Local MacBook

AWS EC2 (t3.medium, us-east-1)

Full benchmark results — AWS EC2

GCP

Full benchmark results — GCP

What the Numbers Tell You

Lowest memory: LangGraph and Claude SDK both stay under 1 MB peak heap — extremely lean for production workloads with memory constraints.

Highest memory: Google ADK uses 85-88 MB consistently across all environments — more than 100x the memory of LangGraph for the same task.

Most consistent parallel efficiency: Claude SDK achieves 2.85-2.96x speedup across all 3 environments — the tightest range in the benchmark.

Choosing the Right Framework

Factor	LangGraph	Google ADK	Claude SDK	AutoGen	CrewAI	Semantic Kernel
Creator	LangChain	Google	Anthropic	Microsoft	CrewAI Inc.	Microsoft
Languages	Python, JS/TS	Python, TS, Java, Go	Python, Node.js	Python, .NET	Python	Python, C#, Java
LLM Support	Any	Gemini-optimized + OpenAI-LLM	Claude only	Any OpenAI-compatible	Any via LiteLLM	OpenAI, Azure, custom
Best For	Complex workflows, full control	Google ecosystem, streaming	Coding agents	Research, collaboration	Role-based multi-agent	Azure/Microsoft ecosystem
Monitoring	LangSmith	OpenTelemetry (OTel)	Cost/turn controls	AutoGen Studio	Limited	Azure Monitor

There is no single best framework — only the right one for your use case:

LangGraph for complex workflows needing full control and maximum observability
Google ADK if you're building in the Google ecosystem and need streaming
Claude SDK for coding-focused agents with the most consistent parallel performance
AutoGen for collaborative, research-style multi-agent workflows
CrewAI for role-based systems where clarity of agent responsibilities matters most
Semantic Kernel if you're already deep in the Microsoft/Azure world

Agent frameworks are moving fast. The developers experimenting now will be the ones leading the next wave of AI-powered products. Pick your framework, run the same task I ran, and see how it behaves on your workload. The numbers will tell you everything feature lists won't.

Agent Building Frameworks: The Tools Behind Every Smart Agent

What Is an Agent?

What Is an Agent Building Framework?

The Benchmark

1. LangGraph — by LangChain

2. Google ADK — by Google

3. Claude Agent SDK — by Anthropic

4. AutoGen — by Microsoft

5. CrewAI — by CrewAI Inc.

6. Semantic Kernel — by Microsoft

Benchmark Results — All Environments

What the Numbers Tell You

Choosing the Right Framework

Vimal Sonagara

Continue reading

What the Gemini Enterprise Demo Doesn't Show You

How to make LLMs cheaper without breaking them

Semantic Kernel + AWS Bedrock: The Multi-Tool Batching Bug