Skip to content

Latest commit

 

History

History
312 lines (224 loc) · 12.1 KB

0032-agents.md

File metadata and controls

312 lines (224 loc) · 12.1 KB
status contact date deciders consulted informed
experimental
crickman
2024-01-24
markwallace-microsoft, matthewbolanos
rogerbarreto, dmytrostruk, alliscode, SergeyMenshykh

SK Agents Overview and High Level Design

Context and Problem Statement

Support for the OpenAI Assistant API was published in an experimental *.Assistants package that was later renamed to *.Agents with the aspiration of pivoting to a more general agent framework.

The initial Assistants work was never intended to evolve into a general Agent Framework.

This ADR defines that general Agent Framework.

Agents Overview

Fundamentally an agent possesses the following characteristics:

  • Identity: Allows each agent to be uniquely identified.
  • Behavior: The manner in which an agent participates in a conversation
  • Interaction: That an agent behavior is in response to other agents or input.

Various agents specializations might include:

  • System Instructions: A set of directives that guide the agent's behavior.
  • Tools/Functions: Enables the agent to perform specific tasks or actions.
  • Settings: Agent specific settings. For a chat-completion agents this might include LLM settings - such as Temperature, TopP, StopSequence, etc

Agent Modalities

An Agent be of various modalities. Modalities are asymmetrical with regards abilities and constraints.

  • SemanticKernel - IChatCompletionService: An Agent based solely on the SemanticKernel IChatCompletionService.
  • OpenAI Assistants: A hosted Agent solution supported the OpenAI Assistant API (both OpenAI & Azure OpenAI).
  • Custom: An custom agent developed by extending the Agent Framework.
  • Future: Yet to be announced, such as a HuggingFace Assistant API (they already have assistants, but yet to publish an API.)

Decision Drivers

  • Agent Framework shall provide sufficient abstraction to enable the construction of agents that could utilize potentially any LLM API.
  • Agent Framework shall provide sufficient abstraction and building blocks for the most frequent types of agent collaboration. It should be easy to add new blocks as new collaboration methods emerge.
  • Agent Framework shall provide building blocks to modify agent input and output to cover various customization scenarios.
  • Agent Framework shall align with SemanticKernel patterns: tools, DI, plugins, function-calling, etc.
  • Agent Framework shall be extensible so that other libraries can build their own agents and chat experiences.
  • Agent Framework shall be as simple as possible to facilitate extensibility.
  • Agent Framework shall encapsulate complexity within implementation details, not calling patterns.
  • Agent abstraction shall support different modalities (see Agent Modalities section).
  • An Agent of any modality shall be able to interact with an Agent of any other modality.
  • An Agent shall be able to support its own modality requirements. (Specialization)
  • Agent input and output shall align to SK content type ChatMessageContent.

Design - Analysis

Agents participate in a conversation, often in response to user or environmental input.

Agent Analysis Diagram

In addition to Agent, two fundamental concepts are identified from this pattern:

  • Conversation - Context for sequence of agent interactions.
  • Channel: ("Communication Path" from diagram) - The protocol with which the agent interacts with the nexus.

Agents of different modalities must be free to satisfy the requirements presented by their modality. Formalizing the Channel concept provides natural vehicle for this to occur.

These concepts come together to suggest the following generalization:

Agent Pattern Diagram

After iterating with the team over these concepts, this generalization translates into the following high-level definitions:

Agent Design Diagram

Class Name Parent Class Role Modality Note
Agent - Agent Abstraction Root agent abstraction
KernelAgent Agent Agent Abstraction Includes Kernel services and plug-ins
AgentChannel - Channel Abstraction Conduit for an agent's participation in a chat.
AgentChat - Chat Abstraction Provides core capabilities for agent interactions.
AgentGroupChat AgentChat Chat Utility Strategy based chat

Design - Abstractions

Here the detailed class definitions from the high-level pattern from the previous section are enumerated.

Also shown are entities defined as part of the ChatHistory optimization: IChatHistoryHandler, ChatHistoryKernelAgent, and ChatHistoryChannel. These ChatHistory entities eliminates the requirement for Agents that act on a locally managed ChatHistory instance (as opposed to agents managed via remotely hosted frameworks) to implement their own AgentChannel.

Agent Abstractions Diagram

Class Name Parent Class Role Modality Note
Agent - Agent Abstraction Root agent abstraction
AgentChannel - Channel Abstraction Conduit for an agent's participation in an AgentChat.
KernelAgent Agent Agent Abstraction Defines Kernel services and plug-ins
ChatHistoryChannel AgentChannel Channel Abstraction Conduit for agent participation in a chat based on local chat-history.
IChatHistoryHandler - Agent Abstraction Defines a common part for agents that utilize ChatHistoryChannel.
ChatHistoryKernelAgent KernelAgent Agent Abstraction Common definition for any KernelAgent that utilizes a ChatHistoryChannel.
AgentChat - Chat Abstraction Provides core capabilities for an multi-turn agent conversation.

Design - Chat-Completion Agent

The first concrete agent is ChatCompletionAgent. The ChatCompletionAgent implementation is able to integrate with any IChatCompletionService implementation. Since IChatCompletionService acts upon ChatHistory, this demonstrates how ChatHistoryKernelAgent may be simply implemented.

Agent behavior is (naturally) constrained according to the specific behavior of any IChatCompletionService. For example, a connector that does not support function-calling will likewise not execute any KernelFunction as an Agent.

ChatCompletion Agent Diagram

Class Name Parent Class Role Modality Note
ChatCompletionAgent ChatHistoryKernelAgent Agent SemanticKernel Concrete Agent based on a local chat-history.

Design - Group Chat

AgentGroupChat is a concrete AgentChat whose behavior is defined by various Strategies.

Agent Group Chat Diagram

Class Name Parent Class Role Modality Note
AgentGroupChat AgentChat Chat Utility Strategy based chat
AgentGroupChatSettings - Config Utility Defines strategies that affect behavior of AgentGroupChat.
SelectionStrategy - Config Utility Determines the order for Agent instances to participate in AgentGroupChat.
TerminationStrategy - Config Utility Determines when the AgentGroupChat conversation is allowed to terminate (no need to select another Agent).

Design - OpenAI Assistant Agent

The next concrete agent is OpenAIAssistantAgent. This agent is based on the OpenAI Assistant API and implements its own channel as chat history is managed remotely.

 OpenAI Assistant Agent Diagram

Class Name Parent Class Role Modality Note
OpenAIAssistantAgent KernelAgent Agent OpenAI Assistant A functional agent based on OpenAI Assistant API
OpenAIAssistantChannel AgentChannel Channel OpenAI Assistant Channel associated with OpenAIAssistantAgent
OpenAIAssistantDefinition - Config OpenAI Assistant Definition of an Open AI Assistant provided when enumerating over hosted agent definitions.

OpenAI Assistant API Reference

OpenAI Assistant API Objects.png

Design - Aggregator Agent

In order to support complex calling patterns, AggregatorAgent enables one or more agents participating in an AgentChat to present as a single logical Agent.

Aggregator Agent Diagram

Class Name Parent Class Role Modality Note
AggregatorAgent Agent Agent Utility Adapts an AgentChat as an Agent
AggregatorChannel AgentChannel Channel Utility AgentChannel used by AggregatorAgent.
AggregatorMode - Config Utility Defines the aggregation mode for AggregatorAgent.

Usage Patterns

1. Agent Instantiation: ChatCompletion

TBD

// Start with the Kernel
IKernelBuilder builder = Kernel.CreateBuilder();

// Add any IChatCompletionService
builder.AddOpenAIChatCompletion(...);

// Include desired plugins / functions    
builder.Plugins.Add(...);

// Include desired filters
builder.Filters.Add(...);

// Create the agent
ChatCompletionAgent agent =
    new()
    {
        Instructions = "instructions",
        Name = "name",
        Kernel = builder.Build()
    };

2. Agent Instantiation: OpenAI Assistant

Since every Assistant action is a call to a REST endpoint, OpenAIAssistantAgent is access via static methods:

Create:

// Start with the Kernel
IKernelBuilder builder = Kernel.CreateBuilder();

// Include desired plugins / functions    
builder.Plugins.Add(...);

// Create config and definition
OpenAIAssistantConfiguration config = new("apikey", "endpoint");
OpenAIAssistantDefinition definition = new()
{
    Instructions = "instructions",
    Name = "naem",
    Model = "gpt-4",
};

// Create the agent
OpenAIAssistantAgent agent =  
    OpenAIAssistantAgent.CreateAsync(
        builder.Build(),
        config,
        definition);

Retrieval:

// Create config
OpenAIAssistantConfiguration config = new("apikey", "endpoint");

// Create the agent based on an existing definition
OpenAIAssistantAgent agent =  OpenAIAssistantAgent.RetrieveAsync(config, "agent-id");

Inspection:

// Create config
OpenAIAssistantConfiguration config = new("apikey", "endpoint");

// Create the agent based on an existing definition
IAsyncEnumerable<OpenAIAssistantDefinition> definitions = OpenAIAssistantAgent.ListDefinitionsAsync(config;

3. Agent Chat: Explicit

An Agent may be explicitly selected to respond in an AgentGroupChat.

// Define agents
ChatCompletionAgent agent1 = ...;
OpenAIAssistantAgent agent2 = ...;

// Create chat
AgentGroupChat chat = new();

// Provide input for chat
ChatMessageContent input = new (AuthorRole.User, "input");
await WriteMessageAsync(input);
chat.AddChatMessage(input);

// First invoke one agent, then the other, display each response.
await WriteMessagesAsync(chat.InvokeAsync(agent1));
await WriteMessagesAsync(chat.InvokeAsync(agent2));

// The entire history may be accessed.  Agent specific history
// may be transformed from primary history.
await WriteMessagesAsync(chat.GetHistoryAsync());
await WriteMessagesAsync(chat.GetHistoryAsync(agent1));
await WriteMessagesAsync(chat.GetHistoryAsync(agent2));

4. Agent Chat: Multi-Turn

Agents may also take turns working towards an objective:

// Define agents
Agent agent1 = ...;
Agent agent2 = ...;
Agent agent3 = ...;

// Create chat with two agents.
AgentGroupChat chat =
    new(agent1, agent2)
    { 
        ExecutionSettings =
        {
            // Chat will continue until it meets the termination criteria.
            TerminationionStrategy = new MyTerminationStrategy(),
        } 
    };

// Provide input for chat
ChatMessageContent input = new (AuthorRole.User, "input");
await WriteMessageAsync(input);
chat.AddChatMessage(input);

// Agent may be added to an existing chat
chat.AddAgent(agent3);

// Execute the chat until termination
await WriteMessagesAsync(chat.InvokeAsync());