Scaling and Hosting Intelligent AI Agents in Production – Cloud, IS & Business Alignment

Overview

Moving AI agents from a prototype “promise” to a production reality requires a shift in focus from model selection to engineering rigors. For an organization to successfully ship and scale intelligent systems, agents must be treated as first-class citizens that integrate with corporate identity, security frameworks, and deployment pipelines.

The Five Layers of the Agent Stack

To choose the right hosting model, it is critical to understand the architecture of an agentic application:

Client / UI: The interface (Teams, Web, Mobile) where users interact.
Caller: The process that invokes the agent and manages high-level orchestration like retries.
Agent Orchestration Layer: Defines the agent’s “brain”—system prompts, tool registries, and multi-agent coordination (e.g., Microsoft Agent Framework or MAF).
Agent Runtime: The actual process executing the loop, dispatching tools, and streaming responses.
Infrastructure & Model: The underlying compute, LLM endpoints, and security management.

Three Production Hosting Paths

Microsoft Foundry provides three distinct paths for hosting the Agent Runtime (Layer 4), allowing organizations to balance control with managed simplicity.

Path A: Prompt Agent (Fully Managed)

This is a declarative approach where Foundry handles all infrastructure. You provide the model and instructions, and the platform manages versioning, durable conversations, and built-in tools like Web Search.

Best for: Rapid deployment without managing containers or servers.

Path B: Hosted Agent (Managed Container)

You package your custom orchestration code (MAF, LangGraph, etc.) as a container image, which Foundry runs in a hypervisor-isolated sandbox.

Key Benefits: Scale-to-zero (pay only when active), instant cold starts, and persistent file systems that survive restarts.
Best for: Custom logic requiring dedicated, secure isolation and automated scaling.

Path C: Self-Hosted (Full Control)

The developer owns the entire process, deploying the agent runtime wherever their code currently lives—such as Azure App Service, Kubernetes (AKS), or on-premises.

Best for: Organizations that need absolute control over the compute environment and existing CI/CD pipelines.

Multi-Agent Workflow Patterns

The Microsoft Agent Framework (MAF) enables sophisticated interaction patterns between agents:

Handoff: One agent handles an intent (e.g., Triage) and passes ownership to a specialist (e.g., Flight Agent).
Sequential: A fixed pipeline where agents work in order (e.g., Draft → Review → Format).
Concurrent: Multiple agents run in parallel to provide different perspectives or research results simultaneously.
Group Chat: A central orchestrator decides which agent speaks next in a star topology.
Magentic: A manager agent dynamically creates and adapts a plan based on the task’s progress.

Observability and Governance

Regardless of the hosting path, production agents require observable telemetry. MAF uses OpenTelemetry (OTel) to automatically emit spans for every agent turn, tool call, and handoff.

Unified Tracing: By connecting OTel to Azure Application Insights, developers can follow a single conversation thread across multiple agents and tool executions.
Foundry Integration: This allows for deep evaluation and monitoring of agent performance directly within the Azure AI Foundry portal.

Summary Table: Choosing Your Path

Feature	Prompt Agent	Hosted Agent	Self-Hosted
Infrastructure Owner	Microsoft Foundry	Microsoft Foundry	You
Scaling	Fully Automated	Scale-to-Zero	Manual/Custom
Custom Code	Declarative Only	Containerized	Anywhere
Security	Managed Identity	Isolated Sandboxes	Custom/Existing

Conclusion: These paths represent a spectrum of control. While Path A offers the fastest time-to-market, Path C provides maximum flexibility. Because MAF is interoperable, you can switch hosting models as your agent matures without rewriting your core logic.