AutoGen

Overview

AutoGen is an open-source framework developed by Microsoft Research for building multi-agent systems where agents collaborate through natural language conversations. The framework enables agents to work autonomously or with human oversight, integrating large language models, tools, and human inputs to perform complex tasks. AutoGen emphasizes conversational interactions where agents communicate through message passing to jointly accomplish objectives.

The framework addresses the problem of coordinating multiple specialized agents through a conversation-driven model using an event-driven, actor-based architecture. AutoGen supports Python and .NET with cross-language interoperability.

Key technical components covered:

  • Layered architecture (Core, AgentChat, Extensions)
  • Agent types and conversation patterns
  • Message passing and communication mechanisms
  • GroupChat and speaker selection strategies
  • Code execution and Docker security
  • Memory and context management
  • Version history and breaking changes
  • AutoGen Studio low-code interface

Layered Architecture

AutoGen 0.4 introduces a redesigned three-layer architecture for enhanced scalability, extensibility, and robustness:

  • Core Layer provides foundational building blocks for event-driven agentic systems. Implements the runtime, message passing, and event system necessary for creating agentic workflows. Uses an actor model where each agent operates as an independent entity processing messages concurrently. Enables asynchronous message exchange between agents, decoupling message delivery from processing to enhance modularity and scalability.

  • AgentChat Layer builds on the Core Layer providing a task-driven, high-level API with pre-built agents like AssistantAgent and UserProxyAgent. Offers features including group chat, code execution, and simplified agent interactions. Most similar to AutoGen v0.2, facilitating easier migration for existing applications.

  • Extensions Layer includes implementations of core interfaces and third-party integrations. Provides the Azure code executor and OpenAI model client. Allows addition of new functionalities and integrations through pluggable components.

This separation of concerns enables developers to choose appropriate abstraction levels—from low-level Core primitives for custom implementations to high-level AgentChat features for rapid development.

Agent Types and Configuration

AutoGen implements three primary agent classes:

  • ConversableAgent serves as the foundational class for agents capable of engaging in dialogues. Configuration includes:

    • human_input_mode determines when the agent solicits human input with values:

      • "ALWAYS": Prompts for human input after every message
      • "TERMINATE": Requests input only upon termination message or after specified auto-replies
      • "NEVER": Does not prompt for human input; relies on auto-replies or termination conditions
    • code_execution_config manages execution of code blocks with settings for working directory, Docker usage, timeout, and number of messages to consider for code execution.

    • llm_config configures language model inference settings including model parameters and behavior.

    function_map maps function names to callable functions, enabling specific actions or computations.

  • AssistantAgent is a subclass of ConversableAgent designed as an AI assistant. Configuration includes:

    system_message provides instructions to the language model guiding responses.

    human_input_mode set to "NEVER" by default for autonomous operation.

    code_execution_config disabled by default (False) as the assistant generates code snippets but does not execute them.

  • UserProxyAgent acts as a proxy for human users. Configuration includes:

    human_input_mode configurable to determine when prompts appear.

    code_execution_config enables or disables code execution. When enabled, the agent can execute code blocks, run code, or call functions.

    llm_config allows language model-based responses when code execution is not performed.

Conversation Patterns

AutoGen supports four conversation patterns for orchestrating multi-agent workflows:

  • Two-Agent Chat enables direct communication between two agents. One agent initiates conversation, and they exchange messages until task completion. Conversation history is processed to generate summary or response.

  • Sequential Chat links multiple two-agent conversations through carryover mechanisms. Each chat’s summary serves as context for the next, enabling complex tasks broken into interdependent sub-tasks. Allows structured multi-stage workflows where outputs feed subsequent stages.

  • Group Chat involves multiple agents sharing a single conversation thread managed by a GroupChatManager. The manager orchestrates interactions and selects the next speaker based on strategies:

    • Round-robin: Agents take turns in fixed circular order
    • Random: Next speaker chosen randomly
    • Manual: Human operator selects next speaker
    • Auto: LLM decides next speaker based on conversation context
  • Nested Chat embeds conversation sequences within single agents, enabling complex workflows packaged as conversational interfaces. Allows agents to initiate sub-conversations internally before responding to main conversation.

Tip

Custom speaker selection in group chats can be implemented through user-defined functions examining conversation state and determining appropriate next speakers.

Message Passing and Communication

AutoGen implements communication through structured mechanisms:

  • Message Passing uses structured messages preserving complete conversation history including tool calls and responses. Each agent receives full context when invoked. Messages follow standardized format containing role, content, and metadata. History accumulates throughout conversation enabling context-aware responses.

  • Context Objects provide shared mutable context via RunContextWrapper allowing dependency injection and state sharing across agent components. Enables agents to access shared resources and maintain consistent state.

  • Streaming Events enable real-time communication through event streams for reactive UI updates and monitoring. Supports observability of agent interactions as they occur.

  • Execution Patterns include:

    • Sequential Execution runs agent loop until final output is produced. Each turn involves LLM call, tool execution, and decision to continue or finish. Follows linear progression through conversation.

    • Concurrent Tool Execution allows multiple tools to execute in parallel, enhancing efficiency in complex workflows where independent operations can run simultaneously.

The actor model foundation enables dynamic conversation topologies adapting based on conversation flow and varying input scenarios. This flexibility benefits complex settings where interaction patterns cannot be predetermined.

GroupChat and Speaker Selection

GroupChat orchestrates multi-agent conversations managing multiple agents conversing together within shared context. GroupChatManager coordinates agent interactions by selecting next speaker based on configured strategy. Manager handles message routing, maintains conversation history, and enforces termination conditions.

Speaker selection strategies determine conversation flow:

  • Round-robin cycles through agents in fixed order ensuring equal participation. After last agent speaks, sequence restarts from first agent.

  • Random selection introduces variability by choosing next speaker randomly from available agents.

  • Manual selection gives human operator control over conversation direction, pausing execution for human choice.

  • Auto selection uses internal mechanism (typically LLM) to determine next speaker based on conversation context. Adapts to dialogue flow dynamically.

  • Custom selection functions enable state-driven workflows

def custom_speaker_selection(last_speaker, groupchat):
    messages = groupchat.messages
    if last_speaker is initializer:
        return coder
    elif last_speaker is coder:
        return executor
    elif last_speaker is executor:
        if messages[-1]["content"] == "exitcode: 1":
            return coder
        else:
            return scientist
    elif last_speaker == scientist:
        return None

Custom functions receive last speaker and GroupChat instance, returning next speaker or selection method. Enables complex branching logic based on message content, agent states, or external conditions.

Code Execution and Security

AutoGen implements secure code execution through Docker containerization:

DockerCommandLineCodeExecutor executes code within Docker containers. Saves each code block to file in working directory then runs inside isolated container. Prevents unauthorized access to host system and mitigates risks from untrusted code.

Configuration options control execution behavior:

Disable code execution by setting code_execution_config to False for each code-execution agent.

Run code locally by setting use_docker to False in code_execution_config, though this reduces security isolation.

Default behavior executes code within Docker containers, enhancing security through isolation. Container provides sandboxed environment where code runs without affecting host system. Users can customize Docker image, mount volumes, and configure resource limits.

Execution workflow:

  1. Agent generates code in response to task
  2. Code saved to file in working directory
  3. Docker container started with working directory mounted
  4. Code executed within container
  5. Output captured and returned to agent
  6. Container cleaned up after execution

This approach balances security with functionality, allowing powerful code execution capabilities while protecting host system from potential threats.

Memory and Context Management

AutoGen provides multiple approaches for managing conversation history and context:

  • Memory Protocol defines interface for custom memory stores with methods:

  • add: Store new memories or conversational data

  • query: Retrieve relevant memories based on context

  • update_context: Modify agent’s context with retrieved information

  • clear: Remove stored memories

  • close: Clean up resources

  • ListMemory implementation maintains memories in chronological order, appending recent memories to model’s context. Ensures agents recall past interactions to inform current responses. Simple but effective for maintaining sequential conversation history.

  • TransformMessages Capability preprocesses chat history to handle long conversations and adhere to token limits. The MessageHistoryLimiter transformation restricts context history to specified number of recent messages, ensuring efficient processing. Enables token budget management without losing critical context.

  • External Memory Integration supports platforms like Mem0 providing scalable memory layer for LLMs. Mem0 enables agents to store and retrieve conversational data across sessions, facilitating personalized and context-aware interactions. Combination of AutoGen with Mem0 creates systems evolving with each user interaction.

  • Retrieval-Augmented Generation (RAG) retrieves relevant information from databases and adds to agent’s context before generating responses. Enhances ability to provide accurate, contextually relevant answers by leveraging external knowledge sources. RAG pattern particularly useful for domain-specific applications requiring access to large knowledge bases.

Memory management directly impacts agent performance, token usage, and response quality. Appropriate strategy depends on use case—short conversations may use full history, while long-running applications benefit from summarization or selective retrieval.

Version History and Breaking Changes

Version 0.2 utilized synchronous architecture with basic multi-agent conversations and foundational tools for building agent workflows. Configured model client using list of dictionaries. Enabled caching by setting cache_seed in LLM config. Used assistant.send to handle incoming messages.

Version 0.4 introduces event-driven, asynchronous architecture enhancing scalability and responsiveness. Provides advanced communication patterns allowing flexible, dynamic agent interactions. Enhances extensibility with modular components.

Breaking changes in migration from 0.2 to 0.4:

Model client configuration now uses component configuration system:

from autogen_core.models import ChatCompletionClient
 
config = {
    "provider": "OpenAIChatCompletionClient",
    "config": {
        "model": "gpt-4o",
        "api_key": "sk-xxx"
    }
}
 
model_client = ChatCompletionClient.load_component(config)

Cache management not enabled by default in 0.4. To use caching, wrap model client with ChatCompletionCache and choose cache store like DiskCacheStore or RedisStore.

Agent interaction replaces assistant.send with asynchronous methods assistant.on_messages or assistant.on_messages_stream supporting streaming responses.

The architectural overhaul from synchronous to asynchronous represents fundamental redesign requiring code changes beyond simple API updates. Applications must adapt to event-driven patterns and asynchronous execution models.

AutoGen Studio

AutoGen Studio provides low-code interface for rapid prototyping, testing, and sharing of multi-agent workflows.

Drag-and-Drop Team Builder enables visual workflow design where users create agent teams by dragging components (agents, models, tools, termination conditions) onto canvas. Simplifies defining complex multi-agent workflows without extensive coding.

Real-Time Updates display agent action streams as tasks execute. Provides immediate feedback on agent interactions, enhancing debugging and testing processes.

Interactive Testing allows users to test agent teams on specific tasks within Team Builder. Users review generated artifacts and monitor agent actions, refining behaviors to ensure desired outcomes.

Component Galleries enable sharing and reusing components (agents, models, tools) across projects. Fosters collaborative environment and accelerates development by leveraging community-contributed components.

Message Flow Visualization shows conversation progression between agents with timeline views and message inspection. Helps developers understand agent interactions and identify communication bottlenecks or logic errors.

Mid-Execution Control allows pausing, inspecting, and modifying agent execution during runtime. Enables intervention in workflows for debugging or steering conversations in specific directions.