The AI Agent Control Stack — March 2026

March 2026 brought an explosion of tools for AI agent safety, sandboxing, and governance. But mapping every player reveals a layer that remains empty — the one where business decisions about agent actions are actually made.

The Stack

The AI agent control stack has four layers. Each answers a different question. Every tool in the ecosystem lives in one of them.

The AI agent control stack — March 2026 Layer 1: Model smart "Is the output safe?" Guardrails AI · CTGT / Mentat · Plano · NeMo Guardrails crowded Layer 2: Runtime safe "Is this allowed?" IronCurtain · IronClaw / NemoClaw · Cisco DefenseClaw Microsoft Agent 365 · CrowdStrike · Palo Alto · Okta Blaxel · AccuKnox · Composio · MintMCP · Gravitee very crowded Layer 3: Decision correct "Should this action happen right now for the business?" Surfit Nobody else. Layer 4: Systems Slack · GitHub · AWS · Gmail · Notion · X · Outlook where real things happen

Layer 1 is crowded. Layer 2 is very crowded — and getting more crowded every week with major players (NVIDIA, Cisco, Microsoft) all building here. Layer 3 is empty. Except for Surfit.

Layer 1: Model — "Is the output safe?"

Smart. These tools make models produce better, safer outputs.

Guardrails AI
Founded 2023$7.5M Seed (Feb 2024)~$1.1M revenue
Open-source Python framework for validating LLM text inputs and outputs. Checks for toxicity, PII leakage, hallucinations, schema compliance. Includes Guardrails Hub (community marketplace of validators) and Snowglobe (simulation engine for testing guardrails before production).
Backed by Zetta Venture Partners, Bloomberg Beta, Pear VC, GitHub Fund. Angels: Ian Goodfellow, Logan Kilpatrick.
The gap: Validates what the model says — not what the agent does with it. Runs inside the agent pipeline. The agent still controls execution.
CTGT / Mentat
Founded 2023$7.7M Seed (Feb 2025)YC W24
Modifies model behavior at the representation level using mechanistic interpretability. Identifies and removes hallucinations, bias, and unwanted behavior without retraining. Mentat provides a runtime policy engine at the inference layer. Won VentureBeat Innovation Showcase 2025.
Backed by Google Gradient, General Catalyst, Y Combinator. Angels: François Chollet, Paul Graham, Michael Seibel. Working with Fortune 10 companies.
The gap: Controls what the model produces — not what the agent does with that output. Operates at inference. The agent still controls execution.
Plano by Katanemo
AI-native proxy between agent and model provider with filter chains for input/output. Built on Envoy, Rust/WASM. Filters model traffic — not system actions.
The gap: Sits between agent and model. Not between agent and systems. The agent still controls execution.
NVIDIA NeMo Guardrails
Programmable guardrails for LLM conversations using Colang, a domain-specific language. Controls what the model says and how it responds.
The gap: Conversational boundaries. Does not govern agent actions on external systems. The agent still controls execution.

Sleeper agent research has also demonstrated that model-layer safety is fundamentally unreliable — models can be trained to pass all safety evaluations while harboring hidden behaviors. This is a structural limitation of the entire layer.

Layer 2: Runtime — "Is this allowed?"

Safe. These tools sandbox, isolate, and enforce policy on agent runtime.

This layer exploded in March 2026. Major announcements from NVIDIA (NemoClaw at GTC), Cisco (DefenseClaw at RSA), and Microsoft (Agent 365). The pattern: sandbox the agent, control access, enforce static policy.

IronCurtain
Open sourceCreator: Niels Provos(ex-Google security lead)
Personal AI assistant security framework with V8 sandbox, policy engine, credential separation, and allow/deny/escalate decisions. Uses plain-English "constitution" compiled into enforceable policy. MITM proxy swaps fake credentials for real ones — agent never sees actual keys. Supports 128+ Google tools, GitHub (41 tools), Git (27 tools), filesystem, Signal.
Closest architectural pattern to Surfit. Does credential separation. Does allow/deny/escalate. But: personal/single-user, static policy, no business context evaluation.
The gap: Policy determines what's permitted. It doesn't evaluate whether THIS SPECIFIC action is correct for the business RIGHT NOW. Static rules can't encode timing, risk context, downstream impact, or organizational state. The agent passes policy. But was it the right action?
NVIDIA NemoClaw + OpenShell
Alpha: March 16, 2026GTC 2026 launchOpen source
Reference stack for running OpenClaw agents inside a secure sandbox. Rust-based container isolation, network egress control, filesystem restrictions, declarative YAML security policies. Includes managed inference with NVIDIA Nemotron models. Partnering with Cisco, CrowdStrike, Google, Microsoft Security.
The gap: Controls where the agent can go — which endpoints, which files. Once access is granted, no opinion on whether this specific action should happen. The agent still controls execution.
Cisco DefenseClaw
Announced RSA 2026March 23, 2026Open source (expected)
Open-source secure agent framework with Skills Scanner, MCP Scanner, AI BoM, and CodeGuard. Part of Cisco's broader agentic security stack: Duo Agentic Identity (agent IAM), AI Defense Explorer Edition (red-teaming), Secure Access SSE (MCP policy enforcement), Splunk AI SOC agents. Integrates with NVIDIA OpenShell for runtime sandboxing.
The gap: Scanning, verification, and sandboxing. Security infrastructure — not business decision-making. The agent still controls execution.
Microsoft Agent 365
Available May 1, 2026$15/user/mo standalone$99/user/mo in M365 E7
Control plane for AI agents within Microsoft 365. Agent Registry, Agent ID in Entra, monitoring, governance. Extends Defender, Entra, and Purview to non-human identities. Provides agent identity, observability, and access control within the M365 ecosystem.
The gap: M365-only. Identity and observability — not cross-system business decisions. Answers "which agents exist and what can they access?" Not "should this action happen right now?" The agent still controls execution.
CrowdStrike · Palo Alto Networks · Okta
Infrastructure, endpoint, and identity security. Expanding into agentic AI security at RSA 2026. Secure the infrastructure below the application layer. No concept of business context for agent actions.
Blaxel
MicroVM sandboxing as a service for AI agents. Kata Containers-based, <25ms resume times. Isolates agent execution environments.
AccuKnox
AI-SPM + runtime enforcement via KubeArmor/eBPF. ModelArmor sandboxing, Zero Trust CNAPP. Kernel-level agent runtime security.
Composio
MCP gateway + hosted tool integrations + agent management. Managed auth, RBAC, tool execution, human-in-loop approval flows. Has approval workflows but no cross-system business context evaluation.
MintMCP by Lutra
Hosted MCP servers for email/calendar + MCP Gateway for telemetry and config management. Capability layer — connectors, not control.
Gravitee MCP Proxy
Centralized governance proxy for MCP traffic. Tool discovery, execution, access control, observability for MCP connections.
Salesforce Agentforce
Enterprise MCP governance within Salesforce ecosystem. MCP allowlisting, trusted gateway, agent builder. Salesforce-only.
Entro Security
Identity governance for AI agents and non-human identities. AGA module, MCP activity monitoring, policy enforcement. Announced RSA 2026.
Opal Security
AI-native access governance. Paladin (AI agent for access), OpalScript, OpalQuery. Announced RSA 2026.

Every tool in Layer 2 answers the same question: "Is this allowed?" They check permissions, enforce policy, sandbox execution, manage identity. These are important. But they all end the same way: the agent passes the check, and executes on its own.

Layer 3: Decision — "Should this happen right now?"

Correct. This is where business decisions about agent actions are made.

Layer 3 — The Decision Layer
Surfit
NOBODY ELSE IS HERE

Every tool in Layer 1 makes models smarter. Every tool in Layer 2 makes agents safer to run. Surfit is where those agents go when they need to actually do something.

The agent proposes an action. Surfit evaluates business context — not just whether it's permitted by policy, but whether it's the right action for the organization right now. Timing. Risk. Downstream impact. Organizational state. These are not things a policy file can encode.

Low-risk actions execute instantly. High-risk actions are held and routed for context. Everything produces a receipt. The agent never executes directly — Surfit controls the execution path.

This is not another policy engine. This is not another sandbox. This is not another access control tool. This is the layer where business decisions about agent actions are actually made — and enforced.

Business context evaluation — not static rules
Cross-system consistency — one decision model
Every action receipted and auditable

The Pattern

Every tool in this landscape is valuable. Every one solves a real problem. And every one ends the same way:

Guardrails validates the output. The agent still executes on its own.

CTGT constrains the model. The agent still executes on its own.

IronCurtain enforces policy. The agent still executes on its own.

NemoClaw sandboxes the runtime. The agent still executes on its own.

Cisco DefenseClaw scans and verifies. The agent still executes on its own.

Microsoft Agent 365 tracks identity. The agent still executes on its own.

CrowdStrike secures infrastructure. The agent still executes on its own.

Surfit is where that changes. The agent proposes. Surfit evaluates the business context. Surfit decides. Surfit executes. The agent never touches the system without Surfit in the path.

Why This Matters Now

Every new tool in Layer 1 and Layer 2 makes agents more capable and safer to run. More capable agents touching more systems means more need for Layer 3.

Every competitor in this landscape is building the case for Surfit without knowing it.

The more agents can do, the more important it becomes to have a layer that evaluates whether they should. That's not safety. That's not compliance. That's business judgment at the point of execution.

Smart. Safe. Correct.
Everyone is building the first two. Surfit is the third.

See how Surfit operates across multiple systems — evaluating, routing, and receipting agent actions in real time.

Watch the Demo
← Back to Blog