AI Security · 2026 · 06 · 12 min

A taxonomy of AI agent attacks: 19 categories, 268 rules

The threat surface we mapped building Aguara, and the failure modes that recur across real deployments.

Gus Aragón

Founder, Oktsec

Key takeaways

–AI agents are attacked through inference, not just input. The classic threat model doesn't fit.
–MCP servers are the softest surface: every connected server is an implicit trust decision.
–Across 268 rules in 19 categories, three failure modes dominate: over-broad permissions, unvalidated tool output, and prompt injection.

Most teams shipping AI agents today inherited a threat model built for software that does what it's told. Agents don't do what they're told. They do what they infer. That gap is the whole problem.

Why MCP changes the surface

The Model Context Protocol connects agents to tools, files, and other systems. It's the most useful thing to happen to agents in years, and also the place where the most damage can be done. Every server an agent trusts is a new entry point, and most are trusted implicitly.

An agent is only as safe as the least-reviewed server it's allowed to call.

What we measured

Across 268 rules in 19 categories, the same failure modes recur: over-broad permissions, unvalidated tool outputs, and prompt-injection paths that turn a helpful agent into a confused deputy. We tracked 58,000+ skills across the major registries to see how widespread these patterns really are.

$ aguara scan ./agent
  ▸ 19 categories · 268 rules
  ✗ 3 high   · over-broad MCP scope
  ✗ 7 medium · unvalidated tool output
  ✓ 258 passed

How to think about it

Treat every external capability as untrusted until proven otherwise. Enforce policy at runtime, not just in review. And measure continuously. The ecosystem changes faster than any audit cycle.

Frequently asked

What is an AI agent security review?

A structured audit of an agent and its MCP surface against 268 rules across 19 categories, with prioritized remediation.

Why are MCP servers a supply-chain risk?

Agents trust the servers they call, often implicitly. Each one is a new entry point an attacker can reach through the agent.

Is static scanning enough?

No. Scanning catches issues at deploy; runtime policy enforcement and continuous monitoring catch behavior in production.

Share:X LinkedIn GitHub Email