Scaling and Hosting Intelligent AI Agents in Production

Overview

Moving AI agents from a prototype “promise” to a production reality requires a shift in focus from model selection to engineering rigors. For an organization to successfully ship and scale intelligent systems, agents must be treated as first-class citizens that integrate with corporate identity, security frameworks, and deployment pipelines.


The Five Layers of the Agent Stack

To choose the right hosting model, it is critical to understand the architecture of an agentic application:

  1. Client / UI: The interface (Teams, Web, Mobile) where users interact.
  2. Caller: The process that invokes the agent and manages high-level orchestration like retries.
  3. Agent Orchestration Layer: Defines the agent’s “brain”—system prompts, tool registries, and multi-agent coordination (e.g., Microsoft Agent Framework or MAF).
  4. Agent Runtime: The actual process executing the loop, dispatching tools, and streaming responses.
  5. Infrastructure & Model: The underlying compute, LLM endpoints, and security management.

Three Production Hosting Paths

Microsoft Foundry provides three distinct paths for hosting the Agent Runtime (Layer 4), allowing organizations to balance control with managed simplicity.

Path A: Prompt Agent (Fully Managed)

This is a declarative approach where Foundry handles all infrastructure. You provide the model and instructions, and the platform manages versioning, durable conversations, and built-in tools like Web Search.

  • Best for: Rapid deployment without managing containers or servers.

Path B: Hosted Agent (Managed Container)

You package your custom orchestration code (MAF, LangGraph, etc.) as a container image, which Foundry runs in a hypervisor-isolated sandbox.

  • Key Benefits: Scale-to-zero (pay only when active), instant cold starts, and persistent file systems that survive restarts.
  • Best for: Custom logic requiring dedicated, secure isolation and automated scaling.

Path C: Self-Hosted (Full Control)

The developer owns the entire process, deploying the agent runtime wherever their code currently lives—such as Azure App Service, Kubernetes (AKS), or on-premises.

  • Best for: Organizations that need absolute control over the compute environment and existing CI/CD pipelines.

Multi-Agent Workflow Patterns

The Microsoft Agent Framework (MAF) enables sophisticated interaction patterns between agents:

  • Handoff: One agent handles an intent (e.g., Triage) and passes ownership to a specialist (e.g., Flight Agent).
  • Sequential: A fixed pipeline where agents work in order (e.g., Draft → Review → Format).
  • Concurrent: Multiple agents run in parallel to provide different perspectives or research results simultaneously.
  • Group Chat: A central orchestrator decides which agent speaks next in a star topology.
  • Magentic: A manager agent dynamically creates and adapts a plan based on the task’s progress.

Observability and Governance

Regardless of the hosting path, production agents require observable telemetry. MAF uses OpenTelemetry (OTel) to automatically emit spans for every agent turn, tool call, and handoff.

  • Unified Tracing: By connecting OTel to Azure Application Insights, developers can follow a single conversation thread across multiple agents and tool executions.
  • Foundry Integration: This allows for deep evaluation and monitoring of agent performance directly within the Azure AI Foundry portal.

Summary Table: Choosing Your Path

FeaturePrompt AgentHosted AgentSelf-Hosted
Infrastructure OwnerMicrosoft FoundryMicrosoft FoundryYou
ScalingFully AutomatedScale-to-ZeroManual/Custom
Custom CodeDeclarative OnlyContainerizedAnywhere
SecurityManaged IdentityIsolated SandboxesCustom/Existing

Conclusion: These paths represent a spectrum of control. While Path A offers the fastest time-to-market, Path C provides maximum flexibility. Because MAF is interoperable, you can switch hosting models as your agent matures without rewriting your core logic.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.