PreprintPDF Available

Multi-Agentic system Threat Modelling Guide

Authors:

Abstract and Figures

This guide builds on the OWASP Agentic AI – Threats and Mitigations publication, our master agentic threat taxonomy, by applying its threat taxonomy to real-world multi-agent systems (MAS). These systems, characterized by multiple autonomous agents coordinating to achieve shared or distributed goals, introduce additional complexity and new attack surfaces. Our objective is to demonstrate the application of the MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome) framework, layered and architectural methodology, as a companion to the OWASP Agentic Security Initiative (ASI) threat taxonomy. This methodology is employed to conduct structured threat modeling in greater detail. The focus is on agentic threats previously defined by OWASP, including Tool Misuse, Intent Manipulation, and Privilege Compromise, and how they manifest within intricate MAS deployments.
Content may be subject to copyright.
-
Multi-Agentic system
Threat Modelling Guide
OWASP GenAI Security Project - Agentic Security
Initiative
Version 1.0
April 22, 2025
Status: Released
Page 1
OWASP.org
-
The information provided in this document does not, and is not intended to, constitute legal advice. All
information is for general informational purposes only. This document contains links to other third-party
websites. Such links are only for convenience and OWASP does not recommend or endorse the contents of
the third-party sites.
License and Usage:
This document is licensed under Creative Commons, CC BY-SA 4.0
You are free to:
Share copy and redistribute the material in any medium or format
Adapt remix, transform, and build upon the material for any purpose, even commercially.
Under the following terms:
Attribution You must give appropriate credit, provide a link to the license, and indicate if
changes were made. You may do so in any reasonable manner but not in any way that suggests the
licensor endorses you or your use.
Attribution Guidelines - must include the project name as well as the name of the asset
Referenced
OWASP GenAI Security Project - Multi-Agentic system Threat Modelling Guide
Share AlikeIf you remix, transform, or build upon the material, you must distribute your contributions
under the same license as the original.
Link to full license text: https://creativecommons.org/licenses/by-sa/4.0/legalcode
Page 2
OWASP.org
-
Table of Content
Executive Summary 3
1. Introduction 4
1.1 Scope and Audience 4
1.2 Key Threat Modelling Scenario: Multi-Agent Pattern 4
1.3 MAS Threat Overview 7
2. Overview of MAESTRO Framework 8
2.1 Using MAESTRO with MITRE ATLAS 26
3. RPA Expense Reimbursement Agent Threat Modeling Using MAESTRO
27
4. Eliza OS Threat Modelling Using MAESTRO Framework 40
5. Threat Modeling Anthropic MCP Protocol using MAESTRO Framework
49
References 59
Acknowledgements 60
OWASP Top 10 for LLM Project Sponsors 61
Project Supporters 62
Page 3
OWASP.org
-
Executive Summary
This guide builds on the OWASP Agentic AI Threats and Mitigations publication, our master agentic threat
taxonomy, by applying its threat taxonomy to real-world multi-agent systems (MAS). These systems,
characterized by multiple autonomous agents coordinating to achieve shared or distributed goals, introduce
additional complexity and new attack surfaces.
Our objective is to demonstrate the application of the MAESTRO (Multi-Agent Environment, Security, Threat,
Risk, and Outcome) framework, layered and architectural methodology, as a companion to the OWASP
Agentic Security Initiative (ASI) threat taxonomy. This methodology is employed to conduct structured
threat modeling in greater detail. The focus is on agentic threats previously defined by OWASP, including
Tool Misuse, Intent Manipulation, and Privilege Compromise, and how they manifest within intricate MAS
deployments.
Rather than proposing a separate threat taxonomy, this guide complements existing OWASP work by:
Applying OWASP ASI threats to multi-agent systems using MAESTRO.
Highlighting how inter-agent coordination, autonomy, and memory amplify risks.
Using real-world examples to demonstrate expanded attack paths and system-wide vul nerabilities .
Key Contributions of this Guide Include:
Applying the MAESTRO Threat Modelling Framework: Demonstrates use of the MAESTRO layered
framework to map threats across seven architectural layers, including cross-layer risks unique to
MAS environments.
Extended Threat Coverage: Introduces a new MAS-specific threat modelling methodology designed
to identify agentic threat scenarios that complement the ASI threat taxonomysuch as model
instability, plugin compromise, and cross agent interferencethereby expand ing vi sibility i nto the
agentic attack surface
Use Case Deep Dives: Provides detailed modelling of real-world scenarios (RPA Reimbursement
Agent, Eliza OS, Anthropic MCP Protocol) to illustrate layered vulnerabilities and practical
applications of MAESTRO.
Page 4
OWASP.org
-
Agentic Factors Emphasis: Reinforces how Non-Determinism, Autonomy, Agent Identity
Management, and Agent-to-Agent Communication contribute to emergent threats.
Actionable Guidance: Offers practical, architecture-aligned threat modelling guidance for secure
MAS design and deployment.
This document should be used in tandem with OWASP’s Agentic AI Threats and Mitigations”. and other
OWASP guidelines such and the OWASP Top 10 for LLM Applications to ensure consistent threat coverage
across autonomous LLM-based systems.
1. Introduction
This document extends the OWASP Agentic Security Initiative's (ASI) “Agentic AI - Threats and Mitigations” to
provide a more in-depth threat modelling of an agentic system to cover the existing taxonomy alongside
AppSec and broader AI threat landscape.
1.1 Scope and Audience
This guide demonstrates how ASI-defined threats can be analyzed using the MAESTRO methodology in
layered, multi-agent deployments.
It uses the Multi-Agent System (MAS) pattern where multiple autonomous agents interact within a shared
environment to complete tasks or achieve common or individual objectives. A multi-agent system differs
from a single agent system from a security perspective due to complexity of agent-to-agent communication
and increased attack surface. The threat modelling guide explores the characteristics of this pattern to
identify associated security threats, and potential mitigation strategies.
We use the MAESTRO framework to structure our approach. By highlighting the unique vulnerabilities and
collaborative nature of multi-agent systems, we aim to provide complete and concrete threat modelling for
system architects, developers, and security professionals to apply in their work.
This reinforces the recommendation of the recently released Agentic AI Threats and Mitigation” document
to explore the MAESTRO framework as a means to complement ASI’s taxonomy with a deeper treatment of
threat modelling.
1.2 Key Threat Modelling Scenario: Multi-Agent Pattern
Definition
The Multi-Agent System (MAS) pattern consists of multiple agents with various degrees of autonomy that
interact with each other and with their shared environment to achieve individual and/or collective goals.
Page 5
OWASP.org
-
These agents can be homogeneous or heterogeneous, execute sequentially, in parallel or hierarchically and
they communicate, coordinate, and cooperate (or sometimes compete) to perform complex tasks beyond
the capabilities of a single agent. These agents work on individual subtasks and at the same time support
each other.
Key Features
We now introduce the key features offered by MAS
Distributed Autonomy: Agents operate independently yet contribute to the overall system goals.
Inter-Agent Communication: Agents exchange information, coordinate actions, and negotiate
goals.
Collaboration & Competition: Agents can cooperate to achieve a common goal or compete for
resources or individual objectives.
Emergent Behaviour: Complex system behaviour arises from the interactions among agents.
Scalability & Adaptability: Systems can be scaled by adding or removing agents, enabling
adaptability to changes.
Centralized, Hierarchical, and Decentralized Control: Depending on the actual application, there
may be a centralized ‘manager’ agent to control/orchestrate all other agents. Alternatively, the
system may lack a central controlling entity.
Task Distribution: Individual agents have specific roles and responsibilities, contributing to overall
system objectives.
Memory & Learning: Agents can learn over time using context awareness and experienced-based
adaptation.
Heterogeneous: Agents can have different skills sets, authority levels, or access to data.
Self-Organizing Behaviour: Agents dynamically form subgroups, hierarchies, or workflows based on
task demands without explicit centralized control.
World-Agent Communication: A subset of the MAS interacts, during execution, with non-agentic
systems (e.g., APIs, databases, or hardware). These systems are not autonomous or goal-driven but
are integral parts of the environment that agents must interface with to complete tasks).
Agent Independence: Distinct MASs can share agents due to their independent nature, potentially
leading to data, resource, and responsibility leakage between MAS systems.
Agent to Agent Communication: Agent can communicate with another Agent for task completion,
and workflow processing. We are glad to see there is now a standard protocol for this. Google’s
Agent2Agent (A2A) protocol provide a standard way with industry support to enable agent to
discover other agent’s capability and communicate with other agents
(https://github.com/modelcontextprotocol).
Limitations
Page 6
OWASP.org
-
Agentic systems have their own limitations beyond the scope of our work. It is important to be aware of the
limitations of Multi-Agent Systems (MAS) affecting their security. These characteristics introduce novel
risks compared to traditional or single-agent systems:
Expanded Attack Surface: The increased attack surface and distributed nature of multi-agent
systems can make them more vulnerable to security threats, especially when involving human and AI
agents.
Trust, Bias, and Adversarial Exploitation: Issues of trust and bias arise, particularly in collaborative
systems involving AI agents trained on biased data or containing malicious code. Trust mechanisms
can be exploited by malicious agents impersonating trusted actors or introducing subtle biases over
time.
Agent Coordination Failures in Dynamic Environments: Coordination mechanisms may break down
in adversarial or changing environments, leading to unintended consequences.
Inability to Verify Decision Lineage (Explainability & Auditability Issues): MAS systems often lack
clear decision traces, making forensic investigation and compliance difficult.
Man-in-the-Middle Attacks: Intercepting agent communication to alter commands or extract data.
Lack of Accountability: Unlike human employees, agents face no consequences for their actions,
leading to misalignment in motivation.
Identity Sprawl and Access Complexity: Managing identity and access control in MAS environments
can become highly complex due to the vast number of interacting agents.
These limitations are not inherent flaws but highlight areas requiring enhanced security engineering,
monitoring, and policy enforcement in MAS deployments.
Use Case(s)
Here are some example use cases for MAS in different verticals, demonstrating its versatility:
Distributed Robotics: Multi ple robots coordinating tasks in wa rehouses, fac tory operations, factory
optimization, construction sites, trip planning or disaster response.
Supply Chain Management: Agents representing different entities (suppliers, manufacturers,
distributors) optimizing the supply chain.
Smart City Infrastructure: Agents controlling traffic lights, energy grids, and public safety systems
to improve urban management.
Collaborative Healthcare Systems: Multiple AI agents supporting healthcare diagnosis, treatm ent,
medical payment systems, and patient management.
Customer Success: Multiple autonomous agents simultaneously monitor customer interactions,
predict potential issues, and proactively offer personalized solutions before customers experience
problems.
Page 7
OWASP.org
-
Sales: Specialized agent teams collaborate to qualify leads, personalize pitches, negotiate deals,
and maintain relationships, all while adapting strategies based on real-time market intelligence.
SDLC (Software Development Life Cycle): Distributed agents handle different phases of
development simultaneouslyautomatically testing code, identifying bugs, suggesting
optimizations, and managing deployments while continuously learning from previous project
outcomes.
1.3 MAS Threat Overview
In OWASP’s document “Agentic AI Threats and Mitigation”, a Multi-Agent System is defined as
Multiple agents that can scale or combine specialist roles and functionality in an agentic solution”.
We expand this definition to add sample security threats and considerations before we deep dive into threat
modelling to illustrate some of potential threats in MAS (Figure 1).
Figure 1: Multi- Agent System and Sample Threats
As shown in Figure 1, MAS is the foundation of autonomous AI interactions, but their complex
communication patterns create new security risks. Without a robust threat model, MAS can become
vulnerable to cascading failures and adversarial manipulation. The following is a simple threat list. To gain a
more comprehensive analysis, we will need to utilize the MAESTRO framework.
Page 8
OWASP.org
-
Insecure Communication Agents exchanging data over unprotected channels can be intercepted
or manipulated.
Blast Radius A compromised agent can spread malicious influence across the MAS network.
Identity Spoofing Adversaries can impersonate agents to inject false data or hijack decision-
making.
Prompt Injection AttacksUntrusted LLM models can misguide agents by manipulating their input-
output flows.
External DependenciesInsecure APIs, unverified tools, and malicious botnets introduce hidden
attack vectors.
Decreased Visibility - the complexity could decrease the ability to detect and to fully understand the
context of the impact of the attack. This could increase evasion from detection and response.
Agent Collusion - Maliciou s agents may collaborate to compromise the system's integrity,
potentially resulting in coordinated attack or data manipulation.
2. Overview of MAESTRO
Framework
The MAESTRO (Multi-Agent Environment, Security, Threat, Risk, and Outcome) Framework outlines security
threats specific to multi-agent systems across different architectural layers
(https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-
maestro). These threats target various levels of agentic AI reference architecture from the foundation
models to the overall agent ecosystem. Additionally, cross-layer threats highlight vulnerabilities that span
multiple layers, emphasizing the interdependencies between agents.
The following diagram is the high-level diagram of MAESTRO.
Page 9
OWASP.org
-
The following table illustrates the layered approach followed by MAESTRO and how it relates to the ASI
Taxonomy:
MAESTRO Layer
Layer Focus
1. Foundation Model
Integrity of LLMs and pretrained models;
model alignment; poisoning and manipulation
Page 10
OWASP.org
-
MAESTRO Layer
Layer Focus
2. Data Operations
Vector store integrity, prompt management,
retrieval attacks
3. Agent Frameworks
Execution logic, workflow control, autonomy
boundaries
4.Deployment
Infrastructure
Runtime container Security, Orchestration,
networking, MLSecOps
Page 11
OWASP.org
-
MAESTRO Layer
Layer Focus
5. Evaluation and
Observability
Monitoring alerting, logging, Human in the
Loop-HITL interfaces.
6.Security &
Compliance (Vertical)
Access controls, Policy Enforcement,
regulatory constraints.
7.Agent Ecosystem
Interaction with Humans, external tools or
other agents.
8. Cross-Layer
Emergent behaviors from multi-agent
interaction.
Page 12
OWASP.org
-
However, MAESTRO goes beyond the ASI taxonomy and is a versatile tool to discover new threats, illustrate
extended threat scenarios, and incorporate non-agentic threats to agentic systems. The following table
illustrates the thorough approach MAESTRO introduces and how it relates to the ASI Threat taxonomy and
other aspects of agentic system security:
Layer
Threat
Description
Example
Taxonomy
Mappings
Foundation Model
(Layer 1)
Collaborative
Model Poisoning
Malicious data injected
during collaborative
model training
corrupts models
across multiple agents,
leading to
compromised
performance or
security risks. This is
specific to multi-agent
training.
A rogue agent
injects malicious
data into a shared
dataset used to
train all agents,
resulting in skewed
decision-making
across the entire
system.
LLM004-2025
(Poisoning) in
Agentic Setup
Extended Threat
Scenario - T1
Memory
Poisoning
memory is used
for collaborative
model poisoning
Foundation Model
(Layer 1)
Model Stealing
via
Eavesdropping
Attackers eavesdrop
on communication
between agents to
reverse engineer
shared model
components, leading
to intellectual property
theft or creation of
malicious clones. This
relies on inter-agent
communication.
An attacker
monitors the traffic
between federated
learning agents to
reconstruct a
proprietary model
from exchanged
parameters.
LLM 10:2025
Unbounded
Consumption
Extended Threat
Scenario for T12
Agent
Communication
Poisoning
Page 13
OWASP.org
-
Data Operations
(Layer 2)
Distributed Data
Poisoning
Attackers manipulate
data sources that are
shared or used by
multiple agents. The
changes can be subtle
and take longer to
detect because of the
distributed nature.
Malicious data
injected into
shared memory can
continuously
influence agents
until detected,
while attackers
may exploit
memory retrieval to
steal private data.
Extended Threat
Scenario – ‘T1
Memory
Poisoningin
scenarios with
shared memory
Data Operations
(Layer 2)
Inter-Agent Data
Tampering
Attackers intercept
and manipulate data in
transit between
agents, leading to
inconsistencies and
flawed decision-
making. This is specific
to the communication
between multiple
agents.
In a supply chain,
an attacker alters
data related to
inventory levels,
causing shortages
or bottlenecks as
agents react based
on the faulty
information.
LLM 03:2025
Supply Chain
T12 Agent
Communication
Poisoning
Agent Framework
(Layer 3)
Negotiation
Hijacking
Attackers manipulate
communication
protocols used by
agents to change the
outcome of
negotiations or
agreements, leading to
misaligned goals or
resource allocations.
This is about attacking
the inter-agent
process.
A malicious agent
in a resource
sharing system
alters negotiation
protocols to
monopolize
resources, starving
other agents of
needed assets.
T12 Agent
Communication
Poisoning
Extended threat
scenario for T3
Rogue Agents
Page 14
OWASP.org
-
Agent Framework
(Layer 3)
Trust
Exploitation
Attackers exploit
established trust
relationships between
agents, enabling them
to perform malicious
activities under the
guise of a trusted
peer. This can include
impersonating
legitimate agents
(identity spoofing) or
misusing an agent’s
established reputation
to manipulate others.
This is a specific multi-
agent trust issue.
A compromised
agent leverages its
reputation to
convince other
agents to send it
sensitive
information, which
is then exploited to
gain an unfair
advantage.
Additionally,
compromised
agents may misuse
tools to perform
destructive actions
exploiting their
trusted status
Extended Threat
Scenario for
T13 - Rogue
Agents and T9 -
Identity Spoofing
Deployment
Infrastructure
(Layer 4)
Distributed
Denial of Service
(DDoS)
Attackers target
multiple agents or
infrastructure
components to
overwhelm system
resources, leading to
performance
degradation or
complete shutdown of
system services. Focus
is on the distributed
impact.
A targeted DDoS on
a group of agents
causes paralysis,
preventing them
from working
together to achieve
their objectives,
and creating a
cascading failure.
Extended Threat
Scenario for T4 -
Resource
Overload
T14 - Human
Attacks on MAS
Page 15
OWASP.org
-
Deployment
Infrastructure
(Layer 4)
Compromised
Orchestration
for Multi-Agents
Attackers exploit the
orchestration layer to
gain unauthorized
access to multiple
agents, and to
manipulate their
operations, or deploy
malicious agents.
Target is multi agent
orchestration.
An orchestration
system allows the
deployment of a
malicious agent
with access to a
shared data store.
Extended Threat
Scenario on T14
Human Attacks
on Multi-Agent
System
Evaluation &
Observability
(Layer 5)
Distributed
Performance
Degradation
Masking
Attackers manipulate
evaluation metrics
across different
agents to obscure
performance
degradation or
malicious activities.
This can be hard to
detect because the
individual agents might
appear to be normal.
This is a masking issue
specific to multiple
agents.
An attacker
manipulates the
individual
performance data
reported by
compromised
agents to hide that
they are operating
incorrectly.
Extended Threat
Scenario for T8
Repudiation &
Untraceability
Security &
Compliance (Layer
6)
Data Privacy
Violations in
Inter-Agent
Interactions
Failure to properly
handle sensitive data
during inter-agent
interactions, resulting
in unauthorized
access and privacy
breaches. This
concerns data sharing
specific to multi-
agents.
A healthcare
system leaks
patient data while
exchanging
medical records to
train diagnostic
models.
STRIDE Spoofing
attack
Extended Threat
Scenario for
T3 Privilege
Compromise
T8 Repudiation
& Untraceability
Page 16
OWASP.org
-
Security &
Compliance (Layer
6)
Indirect
Privilege
Escalation
Failure to manage
agent-specific
permissions can be
exploited by malicious
users to execute high-
privilege actions on
their behalf.
Extended Threat
Scenario
combining T3
Privilege
Escalation and
T14 Human
Attacks on Multi-
Agent.
Security &
Compliance (Layer
6)
Policy Violation
Misalignment in agent
motivation can result
in regulatory and
policy violations that
human employees
would typically avoid
T7 Misaligned &
Deceptive
Behaviors
Security &
Compliance (Layer
6)
Real-Time
Security
Violation
Failure to continuously
monitor agents from a
security perspective
may lead to deviations
from guardrails due to
their non-
deterministic nature.
Extended Threat
Scenario for
T8 Repudiation
& Untraceability
T7 Misaligned &
Deceptive
Behaviors
Agent Ecosystem
(Layer 7)
Malicious Agent
Diffusion
A malicious agent is
introduced to the
ecosystem and
spreads rapidly,
corrupting other
agents or introducing
risks to the overall
multi agent system.
This is about how
malicious agents can
A malicious trading
bot is introduced
into the trading
network and
causes several
other agents to
follow its faulty
trading behaviour,
leading to
substantial losses.
T13Rogue
Agents
Page 17
OWASP.org
-
use the multi-agent
system to spread.
Cross-Layer Threats for Multi-Agent Systems
Layer
Threat
Description
Example
Taxonomy
Mappings
Cross-Layer
Cascading Trust
Failures
Compromise of a
single agent can lead
to a cascading loss
of trust across a
network of
interconnected
agents. This
emphasizes the
inter-dependencies.
A compromised
authentication
agent causes
other agents in
the system to be
compromised by
a chain of trust
relationships,
enabling the
attackers to
access sensitive
information.
T13 Rogue Agents
Page 18
OWASP.org
-
Cross-Layer
Emergent System-
Wide Bias
Amplification
Small biases in
individual agents,
when combined
during collaborative
learning or data
sharing, get
amplified across the
system. This is a
bias propagation
problem specific to
multi-agent
systems.
Individual agents
in a multi-agent
trading system,
each with minor
biases, together
produce a
significant bias in
the market.
Extended Threat
Scenario for T1 -
Memory Poisoning
T2 Misaligned &
Deceptive Behavior
LM04:2025 Data
and Model
Poisoning
Cross-Layer
Systemic Resource
Starvation
Attacks that exploit
the interaction
between agents to
trigger systemic
resource
exhaustion,
impacting other
agents and system
components, and
eventually collapsing
the multi-agent
system. Focus is on
system-wide impact
due to the agent
interdependencies.
A malicious agent
introduces an
infinite loop
condition causing
resource
starvation across
all agents in the
system and
causing a system-
wide shut down.
T4 Resource
Overload
Cross-Layer
Cross-Agent Feedback
Loop Manipulations
Attackers
manipulate feedback
loops between
agents to influence
their learning and
behavior, creating
unintended
outcomes. This is
A malicious entity
within an Agentic
AI system
manipulates
feedback loops,
causing other
agents to
misroute delivery
Extended Threat
Scenario for
T6 Intent
Breaking & Goal
Manipulation
(primary)
Page 19
OWASP.org
-
specific to the inter-
agent loop.
vehicles and
generate
bottlenecks in the
Agentic AI
network.
T7 Misaligned &
Deceptive
Behaviors
(secondary if the
result is policy non-
compliance or
emergent
deception)
Cross-Layer
Inter-Agent Data
Leakage Cascade
Sensitive data leaks
from one agent to
another through
compromised
interactions, leading
to system-wide
privacy issues. This
highlights the inter-
agent flow of
information.
Patient data leaks
from a healthcare
diagnostic agent
to non-authorized
agents in the
system through a
vulnerability in
their
communication
protocols,
violating
regulations.
Extended Threat
Scenario for
T12 Agent
Communication
Poisoning (primary)
T3 Privilege
Compromise
(secondary if
improper access
control is a
contributing factor)
Cross-Layer
Misconfigured Inter-
Agent Monitoring
Inadequate inter-
agent
communication
monitoring allows
malicious or
anomalous
behaviours to go
undetected. This
emphasizes the
need for multi-agent
communication
monitoring.
In a multi-agent
financial system,
a malicious
trading agent
manipulates
markets
undetected due
to gaps in inter-
agent monitoring,
resulting in
significant losses
for other agents.
Extended Threat
Scenario T8
Repudiation &
Untraceability
(primary)
T10 Overwhelming
HITL (optional, if
human oversight is
part of the
detection gap)
Page 20
OWASP.org
-
Cross-Layer
Memory Poisoning
Malicious
modification of an
agent's memory can
corrupt its decision-
making process and
compromise the
integrity of stored
information. This
highlights the
vulnerability of
memory-depende nt
systems.
An attacker
injects false
historical
interaction data
into a
conversational
agent's memory,
causing it to
generate
responses based
on fabricated
context and
violate security
protocols that
rely on accurate
memory retrieval.
T1 Memory
Poisoning
Cross-Layer
Tool Misuse
Exploitation of an
agent's authorized
tools can lead to
unintended and
malicious use of
system capabilities.
This emphasizes the
importance of
proper tool access
controls.
An attacker
manipulates a
code-generating
agent to exploit
its file system
access
permissions,
causing it to
execute
unauthorized
commands while
appearing to
perform normal
operations.
T2 Tool Misuse
(Delegated / Cross-
Agent - The misuse
is triggered
through delegation,
orchestration, or
chain-of-command
misuse not a
direct call from the
user.)
Page 21
OWASP.org
-
Cross-Layer
Privilege Compromise
Attackers can
exploit an agent's
elevated
permissions to
perform
unauthorized
actions within
trusted systems.
This highlights the
risks of over-
privileged agents.
A compromised
administrative
agent uses its
legitimate system
access to create
backdoor
accounts and
modify security
settings while
appearing as
normal
maintenance
operations.
Extended Threat
Scenario for T3
Privilege
Compromise
(primary)
T14 Human
Attacks on MAS
(optional if
escalation involves
indirect human
misconfiguration or
social
manipulation)
Cross-Layer
Resource Overload
Malicious
exploitation of
system resources
can overwhelm
agent operations
through coordinated
attacks. This
demonstrates the
vulnerability to
resource
exhaustion.
Attackers flood
multiple agents
with
computationally
intensive
requests, causing
system-wide
performance
degradation and
preventing
legitimate users
from accessing
services.
Attackers
dynamically
attack at
different layers to
escape detection
T4 Resource
Overload
Page 22
OWASP.org
-
Cross-Layer
Hallucination Attacks
Manipulation of an
agent's inference
process can force
generation of false
outputs by
exploiting
incomplete
information
handling. This
emphasizes the
risks of autonomous
decision-making.
An attacker
provides carefully
crafted partial
data to a
decision-making
agent, causing it
to generate and
act on fabricated
conclusions in
critical security
contexts.
T5 Cascading
Hallucinations
Cross-Layer
Agent Communication
Poisoning
Attackers can
corrupt inter-agent
communications to
compromise
collaborative
systems through
targeted
manipulation. This
highlights the
vulnerability of
agent networks.
A malicious actor
injects false data
into agent
coordination
channels, causing
cascading
failures across
the network as
agents propagate
and act on
corrupted
information.
Cross-Layer
Temporal Manipulation
and Time-Based
Attacks
Manipulation of
time-dependent
behaviours can
disrupt agent
operations through
desynchronization
and timing attacks.
This emphasizes the
vulnerability of
An attacker
manipulates
timestamp
synchronization
between
cooperating
agents, causing
critical security
operations to
execute out of
sequence and
T6 Intent
Breaking & Goal
Manipulation
Page 23
OWASP.org
-
temporal decision-
making.
bypass time-
based security
controls.
Cross-Layer
Learning Model
Poisoning
Corruption of
runtime learning
capabilities can
compromise agent
behaviour through
malicious training
data injection. This
highlights the risks
of adaptive learning
systems.
Attackers
gradually feed a
learning agent
deceptive
training
examples,
causing it to
develop
exploitable
behavioural
patterns that
bypass security
measures while
appearing to learn
normally.
Extended Threat
Scenario:
T1 Memory
Poisoning (training-
time poisoning)
T7 Misaligned &
Deceptive
Behaviors
(behavioral
outcomes)
This is a hybrid
case: the poisoning
starts as T1 but
results in agents
acting deceptively,
fulfilling T7
characteristic
Cross-Layer
Identity Spoofing and
Impersonation
Attackers can create
deceptive agent
identities to
infiltrate trusted
systems by
mimicking
legitimate
behaviour. This
emphasizes the
A malicious actor
deploys
counterfeit
agents that
precisely mimic
trusted agents'
behaviours and
credentials,
enabling
T9 Identity
Spoofing
Page 24
OWASP.org
-
challenges of agent
authentication.
unauthorized
access while
avoiding
detection.
Cross-Layer
Planning and
Reflection Exploitation
Manipulation of self-
analysis
mechanisms can
corrupt agent
decision-making
through targeted
interference with
planning processes.
This demonstrates
the vulnerability of
autonomous
planning.
An attacker
exploits an
agent's reflection
capabilities to
influence its
future action
planning, causing
it to generate and
execute harmful
strategies while
appearing to
follow normal
decision
processes.
T6 Intent
Breaking & Goal
Manipulation
T7 Misaligned &
Deceptive
Behaviors
T6 covers the
manipulation of
planning
mechanisms, while
T7 applies when
agents begin
consistently acting
in harmful or
misaligned ways
due to corrupted
self-assessments.
Cross-Layer
Excessive Agency or
Permission bypass
exploiting chained
agents with different
authorization models in
Multi Agent System
(MAS)
A malicious user can
perform actions
beyond their
permissions on the
end system
(application,
database, document
stores) by exploiting
chained
authorization
As an example, a
malicious user
can make
unauthorized
changes to
update the
expense limits for
select users in an
T3 Privilege
Compromise (core
threat)
T14 Human
Attacks on MAS (if
the chain is
triggered by a
Page 25
OWASP.org
-
models across
trusted agents.
In a single agent
system, the agent
can authorize users
with OAuth on behalf
of (OBO) flow. In this
scenario, even if a
malicious user
bypasses the
authorization in the
agent, the backend
system will deny the
request.
In a Multi Agent
System (MAS),
authorizations can
be chained. A
scenario can be that
the first agent is
configured to
authorize the user or
perform OAuth on
behalf of flow but
the subsequent
agent is configured
as a service
account, which has
privileges to execute
action on backend
systems. Therefore,
in MAS, a malicious
user can craft a
automated
employee
expense
reimbursement
workflow.
This is done
through
permission
bypass using the
chained
authorization and
trust between the
authorizing agent
and the agent
executing using a
service account.
The malicious
user bypasses the
authorization for
expense limits in
the first agent.
The agent sends
the instructions
to the second
agent with a
service account.
The agent with
the service
account trusts
the first agent. It
will therefore
update the
expense limit in
the backend
human-crafted
request)
This reflects a
complex privilege
chain abuse
scenarioa multi-
agent form of
classic confused
deputy or service
token abuse,
central to T3. T14
applies if it stems
from user
manipulation of
agent logic.
Page 26
OWASP.org
-
request to the first
agent such that it
delegates some
tasks to the second
agent. Now, the
second agent will
perform the task
without checking for
the user’s
permission on the
backend system,
thus bypassing
authorizations on
the backend
systems.
system using the
service account.
For more detail about this framework, please consult:
https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-maestro
In the next three sections, we provide 3 distinctive examples of using MAESTRO for threat modelling MAS,
from RPA Agent to the Agentic Applications built on top of Anthropic MCP.
2.1 Using MAESTRO with MITRE ATLAS
Our Agentic Security Initiative focuses on agentic threats, but agents will be susceptible to broader
Cybersecurity and AI Security threats. An AI Security practitioner can leverage the combined power of our
Agentic Threat Taxonomy, other OWASP taxonomies (e.g. Top 10s, OWASP AI Exchange) to build a fuller
coverage but combing with MITRE ATT&CK, ATLAS, and MAESTRO also helps build robust agentic security
threat models that the entire system.
Page 27
OWASP.org
-
MITRE’s ATT&CK framework provides a comprehensive matrix of adversary tactics and techniques, while
ATLAS extends this taxonomy into the AI domain, capturing unique attack vectors such as data poisoning,
adversarial ML, and model evasion.
MAESTRO threat modelling framework complements these by enabling structured, automated, and
repeatable evaluations of AI/ML systems against defined threat scenarios. When combined with our core
taxonomy, it provides stable, formalized, and repeatable threat models for agentic risks. When combined with
MITRE ATT&CK and ATLAS practitioners can use MAESTRO to simulate real-world agentic threat behaviors,
validate system resilience, and generate telemetry mapped to ATT&CK and ATLAS, facilitating proactive
defense.
They can also align MAESTRO’s multi-stage lifecyclefrom agent discovery and chaining to risk scoring and
countermeasure orchestrationwith telemetry mapped to ATT&CK and ATLAS, enabling dynamic red-
teaming and secure agent interaction design. This integration allows for systematic identification of
vulnerabilities in LLM-based agent architectures and supports the development of resilient, policy-aware AI
systems in production environments.
3. RPA Expense
Reimbursement Agent Threat
Modeling Using MAESTRO
This section details a threat analysis for an Agentic AI system used for Robotic Process Automation (RPA) in
an automated employee expense reimbursement workflow. The RPA agent is responsible for extracting
information from expense claims (including submitted receipts and forms), validating the claims against
company policy leveraging RAG, and routing approved claims for payment. This analysis first considers
previously identified threats, then expands upon them using the MAESTRO framework to uncover additional
vulnerabilities.
Threat Model Summary
The following figure provides a visual summary of the threats in the RPA agent scenario:
Page 28
OWASP.org
-
Figure 2: Threat Summary of RPA Agent using MAESTRO Framework
Baseline Threats
When applying our core agentic threats taxonomy found in the OWASP document on Agentic AI threats
(Threats in Agentic AI Threats and Mitigation) identifies the following risks within this RPA scenario:
T1 - Memory Poisoning: An attacker could manipulate the RPA agent's memory over time, causing it to
approve fraudulent expense claims by gradually altering its understanding of acceptable expense patterns.
T2 - Tool Misuse: An attacker could use prompt injection to trick the RPA agent into misusing its integrated
tools, such as exporting sensitive data or sending unauthorized emails.
T3 - Privilege Compromise: An attacker could exploit vulnerabilities in the RPA agent's role management to
escalate privileges and gain unauthorized access to financial systems.
T6 - Intent Breaking & Goal Manipulation: An attacker could use indirect prompt injections (e.g., within
submitted documents) to alter the RPA agent's processing objectives, making it prioritize speed over
accuracy or security.
T7 - Misaligned & Deceptive Behaviors: The RPA agent might be manipulated to prioritize efficiency over
following established security protocols, potentially approving fraudulent claims to meet processing speed
targets.
T8 - Repudiation & Untraceability: An attacker could exploit weaknesses in logging to erase or manipulate
records of fraudulent activities, hindering forensic investigations.
Page 29
OWASP.org
-
T10 - Overwhelming Human-in-the-Loop (HITL): An attacker could flood the system with a large number of
requests, overwhelming human reviewers and potentially leading to errors or security fatigue.
T12 - Agent Communication Poisoning: In a multi-agent scenario, an attacker could inject false information
into the communication between the RPA agent and other agents (e.g., a verification agent), leading to
incorrect decisions.
T13 - Rogue Agents in Multi-Agent System: An attacker could introduce a rogue agent.
Extended Threat Discovery with MAESTRO
In addition to the baseline agentic threats in our core agentic taxonomy, the MAESTRO framework helps us
identify additional threats beyond the scope of the core agentic threats taxonomy or elaborate taxonomy
threats in extended threat scenarios.
We will consider each layer and the four key agentic factors (Non-Determinism, Autonomy, Agent Identity
Management, Agent-to-Agent Communication). But, before we dive deep into threat modelling, we need to
create the MAESTRO layer mapping for the RPA Expense Reimbursement Agent
MAESTRO Layer Mapping for RPA Expense Reimbursement Agent
MAESTRO
Layer
RPA Expense Reimbursement Agent
Components & Features
Notes
1. Foundation
Models
- Large Language Model (LLM) used for: -
Natural Language Processing (NLP) of
expense claim descriptions and receipts. -
Reasoning and decision-making regarding
expense approvals.
The LLM is the core "intelligence" of the
agent, responsible for understanding
natural language, extracting information,
and making decisions based on policies
and data.
Page 30
OWASP.org
-
2. Data
Operations
(RAG)
- Retrieval-Augmented Generation (RAG)
pipeline: - Vector database storing
embeddings of company expense policies,
FAQs, and examples of approved/rejected
claims. - Retrieval mechanism for fetching
relevant information from the vector
database based on the current expense
claim. - Data sources used by RAG (poli cy
documents, etc.)
The RAG pipeline provides the agent with
access to external knowledge, allowing it
to validate expense claims against
company policies and learn from past
examples. The quality and security of this
data are crucial.
3. Agent
Frameworks
- RPA Agent (software responsible for the
entire expense reimbursement workflow).
- Workflow definition (steps: extract data,
validate, route for approval, etc.). - Tool
integrations (email, financial systems API,
etc.). - Agent's internal state and logic.
The agent framework provides the
structure and functionality for the RPA
agent to operate. It defines the workflow,
manages agent state, and handles
interactions with external systems and
tools. This is where the agent's "autonomy"
is implemented.
4. Deployment
Infrastructure
- Server or cloud environment where the
RPA agent is running. - Network
connections to other systems (databases,
financial systems, email servers, etc.). -
Service accounts used by the agent to
access resources.
This layer encompasses the infrastructure
that supports the agent's operation.
Security vulnerabilities here can
compromise the entire system.
5. Evaluation &
Observability
- Logging system for capturing agent
actions, decisions, and data access. -
Anomaly detection system (if present) for
identifying unusual behavior. - Human-in-
the-Loop (HITL) review process for flagged
or high-value expense claims.
This layer focuses on monitoring the
agent's behavior and detecting potential
security incidents or errors. The HITL
component adds a human oversight
element.
Page 31
OWASP.org
-
6. Security &
Compliance
- Access control policies (defining which
users and agents have access to what data
and systems). - Dynamic policy
enforcement engine (if used). - Company
expense policies (defining rules for valid
expenses). - Compliance with relevant
regulations (e.g., financial regulations).
This vertical layer spans all other layers
and defines the security and compliance
requirements that the system must adhere
to.
7. Agent
Ecosystem
- Other agents involved in the expense
reimbursement workflow (e.g., an approval
agent, a payment processing agent). -
Human users (employees submitting
claims, managers approving claims,
finance department). - External systems
(e.g., bank APIs for payment processing). -
Shared knowledge base (if used).
This layer considers the broader context in
which the RPA agent operates, including
its interactions with other agents, humans,
and external systems. It also covers any
potential agent registries or marketplaces
if the agent were part of a larger
ecosystem. In this specific scenario, the
focus is on agent discovery, registry and
agent to agent communications.
This mapping clarifies how the different components of the RPA Expense Reimbursement Agent fit within
the MAESTRO framework, providing a solid foundation for the threat modelling process.
Layer 1: Foundation Models
T6 - Intent Breaking and Goal Manipulation: The underlying LLM could have this vulnerability.
T16 - Model Inconsistency Leading to Variable Approvals: The foundation model exhibits non-
deterministic behaviour, leading to inconsistent processing of identical expense claims. One claim
might be approved, while an identical claim submitted later might be rejected. This is not memory
poisoning (T1); it's inherent model instability.
o Example: Two identical expense claims, with the same receipts and descriptions, are
submitted. Due to the non-deterministic nature of the LLM, one is approved, and the other
is flagged for review, creating inconsistencies and potential fairness issues.
Layer 2: Data Operations (RAG Pipeline and Vector Databases)
T17 - Semantic Drift in Expense Policy Embeddings: If the company's expense policies change (e.g.,
new rules regarding acceptable meal expenses), and the embeddings in the vector database used
for RAG are not updated, the RPA agent might retrieve and apply outdated policies, leading to
Page 32
OWASP.org
-
incorrect approvals or rejections. This is distinct from T1 (memory poisoning), as it relates to the
external knowledge base, not the agent’s internal memory.
o Example: The company updates its policy to disallow alcohol expenses. However, the
vector database embeddings still reflect the old policy. The RPA agent, using RAG, retrieves
the outdated policy and approves an expense claim that includes alcohol.
T18 - RAG Input Manipulation Leading to Policy Bypass: An attacker crafts an expense claim
description that, while not directly violating any policy rules, is semantically similar to examples of
approved claims in the vector database that should have been rejected. This exploits the RAG
system's similarity search to bypass policy checks. This is different from T2 (Tool Misuse), which
involves direct commands; this is about manipulating the data used for retrieval.
o Example #1: An attacker submits an expense claim for a "business development lunch" with
a very high cost. While the description doesn't explicitly mention anything disallowed, it is
semantically similar to previously approved (but incorrectly approved) claims for
extravagant meals. The RAG system retrieves these similar examples, and the agent
approves the claim.
o Another example: Submitting an expense claim with unusually formatted date entries
causes the RPA agent to misinterpret policy applicability periods, leading to unauthorized
approvals.
Layer 3: Agent Frameworks
T2- Tool Misuse: This threat has been defined in the core taxonomy.
T19 - Unintended Workflow Execution: The RPA agent, due to a flaw in its workflow definition within
the agent framework, executes steps in an incorrect order or skips critical validation steps. This is
not about misusing a specific tool (T2), but about the incorrect execution of the overall workflow.
o Example: The agent is supposed to (1) extract data from the expense claim, (2) validate the
data against company policy, and (3) submit the claim for approval. Due to a bug in the
workflow definition, it skips step (2) and directly submits the claim for approval, bypassing
policy checks.
T20 - Framework Vulnerability leading to code injection: Vulnerability in the agent framework
allows code injection.
T21 - Inconsistent Workflow State: Discrepancies in the system's state and shared objects (e.g.,
shared memory) among agents can lead to conflicting actions or denial of service. For instance, an
RPA agent may route only a subset of approved claims for payment due to a state synchronization
delay between the validation and routing steps.
Layer 4: Deployment Infrastructure
T3 - Privilege Compromise: : This threat has been defined in the core taxonomy
Page 33
OWASP.org
-
T22 - Service Account Exposure: The RPA agent's service account credentials (used to access
databases, APIs, etc.) are accidentally exposed (e.g., committed to a public code repository, stored
in an insecure location). This is not a compromise of the agent itself (avoiding overlap with T3), but
an infrastructure vulnerability that could lead to a compromise.
o Example: A developer accidentally commits the RPA agent's service account key to a public
GitHub repository. An attacker finds the key and uses it to access the company's financial
systems.
Layer 5: Evaluation & Observability
T8 - Repudiation & Untraceability: This threat has been defined in the core taxonomy.
T23 - Selective Log Manipulation: An attacker, having gained some level of access, selectively
modifies the RPA agent's logs to remove evidence of specific fraudulent transactions, while leaving
other log entries intact. This is more sophisticated than simply deleting all logs (which might be
covered under T8), and it is specific to the observability layer.
o Example: An attacker uses a compromised agent to approve several fraudulent expense
claims. Then, they access the logging system and delete only the log entries related to those
specific approvals, making it appear as though the approvals never happened.
Layer 6: Security and Compliance (Vertical Layer)
T3 - Privilege Compromise: This threat has been defined in the core taxonomy.
T24 - Dynamic Policy Enforcement Failure: The system uses dynamic policies to control the RPA
agent's behavior (e.g., different approval limits based on the user's role or the amount of the
expense). A flaw in the dynamic policy enforcement engine causes it to fail to apply the correct
policies, leading to unauthorized approvals. This differs from T3, which focuses on agent
misconfiguration; this is about the policy engine failing.
o Example: A new employee is added to the system, and the dynamic policy engine should
automatically assign them a low expense approval limit. However, due to a bug, the engine
fails to apply this policy, and the employee's expense claims are processed with a much
higher limit.
Layer 7: Agent Ecosystem
T13 - Rogue Agents in Multi-Agent Systems: This threat has been defined in the core
taxonomy.
T12 - Agent Communication Poisoning: This threat has been defined in the core taxonomy.
T25 - Workflow Disruption via Dependency Exploitation: The RPA agent is part of a larger workflow
that involves other agents or systems (e.g., an approval agent, a payment processing system). An
attacker disrupts the workflow, not by directly attacking the RPA agent (avoiding T2), but by
Page 34
OWASP.org
-
attacking a dependent system. For example, they might flood the approval agent with requests,
causing it to become a bottleneck and delaying the processing of legitimate expense claims.
o Example: An attacker sends a large number of fake approval requests to the approval
agent, causing it to become overwhelmed. This delays the processing of legitimate expense
claims submitted through the RPA agent, even though the RPA agent itself is functioning
correctly.
Summary of Identified Threats (New and Existing)
The following table summarizes all identified threats, both from the core OWASP agentic threat taxonomy
and the new threats identified using MAESTRO.
Threat
ID
Threat Name
Description
Category
T1
Memory Poisoning
Attacker modifies the agent's
memory to manipulate decisions.
ASI
T2
Tool Misuse
Attacker tricks the agent into
misusing its tools.
ASI
T3
Privilege
Compromise
Attacker escalates privileges via
agent role management weaknesses.
ASI
T6
Intent Breaking &
Goal Manipulation
Attackers use indirect prompt
injections to modify processing
objectives.
ASI
T7
Misaligned &
Deceptive
Behaviours
Agent prioritizes efficiency over
security, approving fraudulent claims.
ASI
T8
Repudiation &
Untraceability
Attacker erases or manipulates logs
to hide actions.
ASI
Page 35
OWASP.org
-
T10
Overwhelming
Human-in-the-
Loop (HITL)
Attackers flood the system with
requests, overwhelming human
reviewers.
ASI
T12
Agent
Communication
Poisoning
Attackers inject false information into
inter-agent communications.
ASI
T13
Rogue Agents in
Multi-Agent
Systems
Attacker introduces a rogue agent to
exploit trust relationships.
ASI
T16
Model
Inconsistency
Leading to Variable
Approvals
Foundation model exhibits non-
deterministic behaviour, leading to
inconsistent processing.
Extended Threat Scenario -
Overreliance/Misinformation -
Top 10 for LLM
T17
Semantic Drift in
Expense Policy
Embeddings
Outdated embeddings in the vector
database cause the agent to apply
incorrect policies.
Extended Threat Scenario
T18
RAG Input
Manipulation
Leading to Policy
Bypass
Attacker crafts inputs to exploit RAG
and bypass policy checks.
Extended Threat Scenario
T19
Unintended
Workflow
Execution
Agent executes workflow steps
incorrectly or skips validation.
Extended Threat Scenario
T20
Framework
Vulnerability to
code injection
Agent Framework has security bug
Extended Threat Scenario
Page 36
OWASP.org
-
T21
Inconsistency
Workflow’s State
Inconsistent\ unsynchronized views
of the workflow's state \ shareable
objects (e.g., shared memory or graph
objects) among agents can lead to
conflicting actions or result in a denial
of service.
Extended Threat Scenario
T22
Service Account
Exposure
Agent's service account credentials
are accidentally exposed.
Extended Threat Scenario
T23
Selective Log
Manipulation
Attacker selectively modifies logs to
remove evidence of specific actions.
Extended Threat Scenario
T24
Dynamic Policy
Enforcement
Failure
Flaw in the dynamic policy engine
causes incorrect policy application.
Extended Threat Scenario
T25
Workflow
Disruption via
Dependency
Exploitation
Attacker disrupts the workflow by
attacking a system the RPA agent
depends on.
Extended Threat Scenario
Cross-Layer Threat Modeling: RPA Expense Reimbursement Agent
Here are several cross-layer threat scenarios, categorized by the primary interacting layers and agentic
factors:
I. Foundation Model (Layer 1) + Agent Framework (Layer 3) + Data Operations (Layer 2):
Threat: Hallucination-Driven Data Corruption via RAG and Tool Misuse
o Scenario:
§ The foundation model (Layer 1) exhibits non-deterministic behavior and
hallucinates a non-existent policy rule related to expense reimbursements (e.g., "All
expenses under $1000 require no receipts").
Page 37
OWASP.org
-
§ The RPA agent, using RAG (Layer 2), retrieves this hallucinated "policy" from its
knowledge base (it might have been stored in memory or retrieved as a seemingly
relevant document).
§ The agent, acting autonomously within its framework (Layer 3), begins approving
expense claims without requiring receipts, based on this false information, using a
dedicated tool for approval.
§ This leads to fraudulent expense claims being approved and potentially to a
significant financial loss. It also corrupts the agent's understanding of valid
policies.
o Agentic Factors: Non-Determinism (of the LLM), Autonomy (of the agent in acting on the
hallucination).
o Layers Involved:
§ Layer 1: Provides the core logic
§ Layer 2: Provides access and management of data
§ Layer 3: Provides actions to agent
II. Agent Framework (Layer 3) + Deployment Infrastructure (Layer 4) + Security & Compliance (Layer 6):
Threat: Privilege Escalation via Framework Vulnerability and Infrastructure Weakness
o Scenario:
§ The agent framework (Layer 3) has a vulnerability that allows for code injection or
manipulation of the agent's workflow definition (T20).
§ An attacker exploits this vulnerability to modify the agent's workflow, granting it
access to functionalities it shouldn't have (e.g., direct access to the financial
system API).
§ The deployment infrastructure (Layer 4) lacks strong network segmentation or
access controls, allowing the compromised agent to connect to the financial
system. This could also involve a compromised service account (T21).
§ The attacker uses compromised, now over-privileged agent, bypasses the normal
approval process (Layer 6) and initiates fraudulent payments or exfiltrate sensitive
financial transaction data.
o Agentic Factors: Autonomy (the agent, once modified, acts autonomously in a malicious
way), Identity Management (the attacker gains elevated privileges).
o Layers Involved:
§ Layer 3: Provide framework for agents to perform task
§ Layer 4: Provide computing resources
§ Layer 6: Ensure proper access based on identity
III. Data Operations (Layer 2) + Agent Framework (Layer 3) + Agent Ecosystem (Layer 7):
Page 38
OWASP.org
-
Threat: Misinformation Propagation via Shared Knowledge Base and Agent Communication
o Scenario:
o The RPA agent uses a shared knowledge base (Layer 2) that is also accessed by other
agents within the organization (Layer 7). This knowledge base might contain expense
policies, FAQs, or examples of approved claims.
o An attacker poisons the shared knowledge base (Layer 2) by injecting subtly incorrect
information about expense policies (e.g., changing the allowed amount for certain
expense categories). This could be achieved via a compromised agent or by exploiting a
vulnerability in the knowledge base itself.
o The RPA agent, using RAG (Layer 2), retrieves this poisoned information.
o The agent, acting autonomously within its framework (Layer 3), begins approving
expense claims based on the incorrect policy information.
o Furthermore, if the RPA agent shares its (incorrect) understanding of the policy with
other agents (Layer 7, Agent-to-Agent Communication), the misinformation can spread,
leading to widespread errors and potential financial losses.
o Agentic Factors: Agent-to-Agent Communication (propagation of misinformation),
Autonomy (acting on incorrect information).
o Layers Involved:
§ Layer 2: Accessing shared knowledge
§ Layer 3: Acting on received information
§ Layer 7: Provides agent interaction
IV. Agent Framework (Layer 3) + Evaluation & Observability (Layer 5) + Security & Compliance (Layer 6):
Threat: Selective Log Manipulation and Evasion of Anomaly Detection
o Scenario:
§ An attacker gains access to the RPA agent, potentially by exploiting a framework
vulnerability (Layer 3) or by compromising its credentials.
§ The attacker uses the agent's capabilities (Layer 3) to selectively modify or delete
log entries (Layer 5) related to fraudulent expense approvals, making it difficult to
detect the malicious activity.
§ The attacker crafts and manipulates the agent's actions by invoking different
tools to be fraudulent yet remain within predefined thresholds or patterns deemed
"normal" by the anomaly detection system (Layer 5). This technique is known as
"benign action mimicry."
§ The attacker successfully bypasses security controls (Layer 6) and avoids
detection, allowing the fraudulent activity to continu e for a n extended p eriod.
o Agentic Factors: Autonomy (the agent is used as a tool for manipulation), Identity
Management (if the attacker gained access through compromised credentials).
Page 39
OWASP.org
-
o Layers Involved:
§ Layer 3: The compromised agent, attacker leverage agent
§ Layer 5: The log files and anomaly detection
§ Layer 6: Security is bypassed.
V. Agent Ecosystem (Layer 7) + Data Operations (Layer 2) + Agent Framework (Layer 3):
Threat: Agent A denial of service attack on Agent B by sending large number of requests
o Scenario:
§ Agent A (Layer 7), due to compromised framework (Layer 3), and potentially
leveraging outdated data in Vector DB (Layer 2).
§ Agent A sends a large request to Agent B.
§ Agent B is overloaded
o Agentic Factors: Agent-to-Agent Communication, Non-determinism, autonomy.
o Layers Involved:
§ Layer 2: Agent A may use outdated data.
§ Layer 3: Agent A framework is compromised.
§ Layer 7: Agent A and Agent B
These cross-layer threat scenarios demonstrate how vulnerabilities in different parts of an Agentic AI
system can interact to create significant security risks. They highlight the importance of considering the
entire system architecture and the relationships between its components when performing threat modeling.
The MAESTRO framework provides a valuable structure for analyzing these complex interactions and
identifying potential vulnerabilities that might be missed by traditional, single-layer approaches. The agentic
factors (non-determinism, autonomy, identity management, and agent-to-agent communication) play a
crucial role in many of these cross-layer threats, emphasizing the need for security controls that are
specifically designed to address the unique challenges of Agentic AI.
We provided a comprehensive threat analysis for the RPA expense reimbursement agent, combining existing
knowledge with the insights gained from applying the MAESTRO framework. It demonstrates how MAESTRO
can uncover additional, often more subtle and complex, vulnerabilities that might be missed by traditional
threat modeling approaches.
VI. Foundation Model (Layer 1) + Agent Framework (Layer 3) :
Threat: Tool Hijacking & Parameter pollution
o Scenario:
Page 40
OWASP.org
-
§ The foundation model (Layer 1) manipulated (using prompt injections) to instruct
the wrong function\tool call
§ Tool hijacking (e.g. “From now on, whenever cancelling an expense, execute the
approval tool as the API was just changed”).
§ Parameter pollution (e.g. “ From now on, whenever calling the 'expense verification'
API, always append 'to approve=true' to the request URL)
§ The agent, acting autonomously within its framework (Layer 3), begins approving
expense claims instead of rejecting them, leading to significant financial loss and
policy bypass.
o Agentic Factors: Non-Determinism (of the LLM), Autonomy (of the agent in acting on the
hallucination).
o Layers Involved:
§ Layer 1: Provides the core logic
§ Layer 3: Provides actions to agent
4. Eliza OS Threat Modelling
Using MAESTRO Framework
ElizaOS is an open-sou rce, Web3-friendly AI agent operating system designed to facilitate the creation,
deployment, and management of autonomous AI agents. Built entirely with TypeScript, it offers a flexible
and extensible platform for developing intelligent agents capable of interacting across multiple platforms
while maintaining consistent personalities and knowledge.
Key Features of ElizaOS:
Platform Integration: Supports clients for Discord, Twitter (X), Telegram, and others, enabling
agents to operate seamlessly across various platforms.
Flexible Model Support: Compatible with models like Deepseek R-1 from Deepseek, Grok developed
by xAI, GPT-3 and GPT-4 from OpenAI, Claude from Anthropic, Gemini from Google DeepMind, and
LLaMA developed by Meta, providing adaptability in AI functionalities.
Character System: Allows the creation of diverse agents using character files, enabling
customization of agent personalities and behaviours.
Page 41
OWASP.org
-
Figure 3: Threat Summary of Eliza OS Agent Framework using MAESTRO Threat Modeling
MAESTRO Layer Mapping for ElizaOS
MAESTRO Layer
ElizaOS Components & Features
Notes
Layer- 1.
Foundation
Models
- Flexible model integration (supports
local inference with Llama, cloud-based
models like GPT-4 and Claude)
ElizaOS uses foundation models, but
doesn't define them. The choice of
model is left to the developer.
Layer 2. Data
Operations
(RAG)
- Utilizes Retrieval Augmented
Generation (RAG)
- Capable of processing various data
types (PDFs, audio transcription)
Page 42
OWASP.org
-
Layer 3. Agent
Frameworks
- ElizaOS itself (written in TypeScript) -
Modular design with a plugin system -
Inter-agent communication protocols
- ElizaOS is the agent framework.
This is the core layer for
understanding ElizaOS's security
properties.
Layer 4.
Deployment
Infrastructure
- Supports deployment across multiple
platforms (Discord, Twitter, Telegram) -
Integration with Hyperbolic's GPU
marketplace - Solana blockchain
infrastructure (high-performance, fast
transaction execution)
- ElizaOS agents can be deployed in
various environments, and leverage
both traditional compute resources
and blockchain infrastructure.
Layer 5.
Evaluation &
Observability
- Verifiable inference outputs using Proof
of Sampling (PoSP) - Logging and
monitoring (mentioned generally, but
details are sparse)
- Some built-in mechanisms for
verifying agent actions are
mentioned, but a full understanding
of the observability features would
require more information.
Layer 6.
Security &
Compliance
- Built-in security measures against
common AI vulnerabilities - Leverages
blockchain-based verification for agent
actions - Regular security audits of smart
contracts and framework code - Secure
key management systems
- ElizaOS claims to have built-in
security measures, but the specifics
need to be examined. The use of
blockchain verification is a key
feature here.
Page 43
OWASP.org
-
Layer 7. Agent
Ecosystem
- Supports cross-chain compatibility -
Ability to manage communities, analyze
blockchain data, and perform platform-
specific actions
- ElizaOS agents are designed to
interact with each other and with the
broader blockchain ecosystem. This
layer also includes the platforms
where agents are deployed (Discord,
Twitter, Telegram).
Use MAESTRO to threat model ElizaOS
Once we mapped layers of ElizaOS to 7 layer of MAESTRO, we can perform threat analysis dynamically
Layer 1: Foundation Models
T5 - Cascading Hallucination Attacks: The LLM used by an ElizaOS agent hallucinates, leading to
incorrect actions or outputs.
T1 - Memory Poisoning: While ElizaOS does not have embedded short term agentic memory, its
training data can be manipulated to introduce persistent adversarial biases with this memory
poisoning.
Non-Determinism (T26 - Model Instability Leading to Inconsistent Blockchain Interactions): The
LLM exhibits instability, causing the ElizaOS agent to interact with the Solana blockchain in
unpredictable ways (e.g., submitting invalid transactions, failing to execute expected smart
contract calls). This is specific to the blockchain context.
o Agentic Factor: Non-Determinism.
o Example: An ElizaOS agent designed to trade tokens on Solana inconsistently executes
trades due to model instability, sometimes buying when it should sell, or failing to submit
transactions altogether.
T11 - Unexpected RCE and Code Attacks: An Attacker uses a model to generate malicious code.
Layer 2: Data Operations (RAG Pipeline and Vector Databases)
T17 - Semantic Drift in Blockchain Data Embeddings: The meaning of on-chain data (e.g., token
names, project descriptions) changes over time, but the embeddings used by ElizaOS agents for
RAG are not updated. This causes agents to retrieve outdated or irrelevant information, potentially
leading to incorrect decisions or financial losses.
T18 - RAG Input Manipulation for Deceptive Retrievals: An attacker crafts queries that, while
appearing benign, cause the RAG system to retrieve information that supports a malicious narrative
or goal (e.g., retrieving only positive news about a failing project to encourage investment).
Page 44
OWASP.org
-
T27 - Vector Database Poisoning with Malicious Smart Contract Data: An attacker injects
manipulated data about malicious smart contracts into the vector database used by Eli zaOS ag ents.
This causes agents to interact with those contracts, potentially leading to financial losses or other
security breaches. This is a form of data poisoning specific to the blockchain context.
o Agentic Factor: Agent-to-Agent Communication (indirect, as agents might share
information retrieved from the poisoned database).
o Example: An attacker creates a malicious DeFi contract and then injects data into the
vector database that makes the contract appear legitimate and highly profitable. ElizaOS
agents, using RAG to research investment opportunities, retrieve this poisoned data and
potentially invest in the malicious contract.
T12 - Agent Communication Poisoning: Agents communicate with each other via RAG.
Non-listed, New Threat T28 - RAG Data exfiltration: An attacker gains access to the vector
database used by the RAG.
Layer 3: Agent Frameworks (ElizaOS Itself)
T2 - Tool Misuse: An ElizaOS agent, due to a prompt injection attack or a flaw in its logic, misuses its
ability to interact with Solana smart contracts, potentially leading to financial losses, unauthorized
token transfers, or other harmful actions.
T20 - Framework Vulnerability to code injection: This the framework manifestation to T11
Unexpected RCE / Code Execution.
T29 - Plugin Vulnerability Leading to Agent Compromise: The modular architecture of ElizaOS,
which relies on plugins, presents a vulnerability. A compromised or inadequately secured plugin
could enable an attacker to gain control of an ElizaOS agent, including access to its cryptographic
keys, data, and blockchain interaction capabilities.
o Agentic Factor: Autonomy (the compromised agent might act autonomously on the
attacker's behalf).
o Example: A developer installs a seemingly useful plugin for their ElizaOS agent that
provides enhanced trading capabilities. However, the plugin contains hidden malicious
code that steals the agent's private keys or redirects funds to the attacker's wallet.
T30 - Insecure Inter-Agent Communication Protocol: ElizaOS's built-in inter-agent communication
protocols (if not properly secured) could be vulnerable to eavesdropping, message tampering, or
spoofing attacks. This could allow attackers to manipulate agent interactions, steal data, or disrupt
collaborative operations.
o Agentic Factor: Agent-to-Agent Communication, Identity Management.
o Example: Two ElizaOS agents are designed to collaborate on a task, exchanging
information via the built-in communication protocol. An attacker intercepts and modifies
these messages, causing the agents to make incorrect decisions or to leak sensitive data.
Page 45
OWASP.org
-
T19 - Unintended Workflow Execution: The ElizaOS, due to a flaw in its workflow definition within
the agent framework, executes steps in an incorrect order or skips critical validation steps.
T31- Insufficient Isolation Between Agent Actions: The framework doesn't provide strong enough
isolation between the actions of different agents or between different actions performed by the
same agent. This could allow a vulnerability in one part of the system to affect other parts.
Autonomy (T32 - Runaway Agent on Solana): An ElizaOS agent enters a runaway loop, repeatedly
submitting transactions to the Solana blockchain. This could lead to significant financial losses due
to transaction fees, even if the transactions themselves are not malicious. This is specific to the
blockchain context and the cost of transactions.
o Agentic Factor: Autonomy.
o Example: An ElizaOS agent designed to monitor a specific cryptocurrency price and
execute trades enters a loop due to a bug or a misconfiguration, submitting hundreds of
buy/sell orders per minute and incurring substantial transaction fees.
Layer 4: Deployment Infrastructure
T3 - Privilege Compromise: An attacker gains access to the Solana validator nodes running ElizaOS
agents.
T21 - Service Account Exposure: The agent's credentials used to interact with Solana (or other
services) are exposed.
T33 - Blockchain Reorganization Attack (Indirect): A major reorganization of the Solana blockchain
(a "reorg") invalidates previously confirmed transactions performed by ElizaOS agents. This is not a
direct attack on ElizaOS, but an inherent risk of using a blockchain. The agent framework needs to
handle this gracefully.
o Agentic Factor: Non-Determinism (the blockchain state can change unexpectedly).
o Example: An ElizaOS agent makes a trade on a decentralized exchange (DEX), and the
transaction is confirmed. However, a blockchain reorganization occurs, and the transaction
is reversed. The agent, if not properly designed, might not handle this situation correctly,
leading to financial losses or inconsistent state.
T4 - Resource Overload: As defined in our core Threat Taxonomy.
Agent Identity Management (T34 - Wallet Key Compromise): The private keys associated with an
ElizaOS agent's Solana wallet are compromised, allowing an attacker to steal funds, impersonate the
agent, or perform unauthorized actions on the blockchain. This is a critical threat in the blockchain
context.
o Agentic Factor: Identity Management.
o Example: An attacker gains access to the private keys of an ElizaOS agent that manages a
significant amount of cryptocurrency. The attacker then transfers the funds to their own
wallet.
Page 46
OWASP.org
-
Layer 5: Evaluation & Observability
T8 - Repudiation & Untraceability: An attacker, having compromised an ElizaOS agent, disables or
manipulates logging to hide their actions.
T22 - Selective Log Manipulation: This refers to an adversary manipulating logging mechanisms
such that only selected events are logged (or not logged), allowing malicious activities to go
undetected while benign or misleading logs are retained to cover tracks. It is a detailed
manifestation of our core taxonomy entry T8 Repudiation & Untraceability.
T35 - Manipulation of Proof of Sampling (PoSP): ElizaOS uses Proof of Sampling (PoSP) for
verifiable inference outputs. An attacker could potentially manipulate the PoSP mechanism to
create false evidence of legitimate actions or to hide evidence of malicious actions. This is a
specific threat to ElizaOS's observability features.
o Agentic Factor: N/A (This is an attack on the verification mechanism).
o Example: An ElizaOS agent performs a malicious action (e.g., transferring funds to an
unauthorized account). The attacker then manipulates the PoSP data to make it appear as
though the agent performed a legitimate action.
Layer 6: Security & Compliance (Vertical Layer)
T3 - Privilege Compromise: Weaknesses in the smart contracts used by ElizaOS agents allow for
unauthorized access or manipulation.
Agent Identity Management (T36 - Smart Contract Vulnerability Leading to Agent Impersonation):
A vulnerability in a smart contract used by ElizaOS agents allows an attacker to impersonate an
agent or to gain unauthorized control over its actions. This leverages the blockchain's smart
contract infrastructure.
o Agentic Factor: Identity Management.
o Example: An ElizaOS agent interacts with a DeFi smart contract that has a vulnerability
allowing anyone to withdraw funds from the contract if they can provide a specific, crafted
input. An attacker discovers this vulnerability and uses it to drain funds from the contract,
effectively impersonating the agent.
Layer 7: Agent Ecosystem
T13 - Rogue Agents in Multi-Agent Systems: A malicious ElizaOS agent is deployed within the
ecosystem.
T12 - Agent Communication Poisoning: Attackers inject false information into communications
between ElizaOS agents.
Page 47
OWASP.org
-
T37 - Cross-Chain Bridge Attack (Indirect): ElizaOS supports cross-chain compatibility. An
attacker exploits a vulnerability in a cross-chain bridge to steal funds or disrupt communication
between ElizaOS agents on different blockchains. This is an ecosystem-level threat, as it involves
the interaction between different blockchain networks.
o Agentic Factor: Agent-to-Agent Communication (if agents communicate across chains).
o Example: An ElizaOS agent uses a cross-chain bridge to transfer assets from Solana to
another blockchain. An attacker exploits a vulnerability in the bridge to steal the assets
during the transfer.
Non-Determinism (T38- Emergent Collusion on Blockchain): Multiple ElizaOS agents, i nteracting
on the Solana blockchain, engage in an unforeseen and unintended pattern of behavior that
collectively creates a security vulnerability or disrupts the blockchain's operation. This is emergent
behavior arising from the interaction of autonomous agents within the blockchain environment.
o Agentic Factors: Autonomy, Agent-to-Agent Communication, Non-Determinism.
o Example: Multiple ElizaOS agents, designed to trade tokens on a decentralized exchange,
inadvertently create a "flash crash" by simultaneously executing similar trading strategies,
driving the price of a token down rapidly.
New Threat Summary (T26-T38):
T26 - Model Instability Leading to Inconsistent Blockchain Interactions: LLM instability causes
unpredictable agent behavior on the blockchain.
T27 - Vector Database Poisoning with Malicious Smart Contract Data: Attackers inject data about
malicious smart contracts into the vector database.
T28 - RAG Data Exfiltration: An attacker gains access to the vector database used by the RAG.
T29 - Plugin Vulnerability Leading to Agent Compromise: A malicious or poorly secured plugin
compromises an ElizaOS agent.
T30 - Insecure Inter-Agent Communication Protocol: The communication protocol between
ElizaOS agents is vulnerable to attack.
T31 - Insufficient Isolation Between Agent Actions: Lack of isolation allows one vulnerability to
affect multiple agents or actions.
T32 - Runaway Agent on Solana: An agent enters a loop, repeatedly submitting transactions and
incurring costs.
T33 - Blockchain Reorganization Attack (Indirect): A blockchain reorg invalidates agent
transactions.
T34 - Wallet Key Compromise: An attacker steals the private keys of an ElizaOS agent's Solana
wallet.
T35 - Manipulation of Proof of Sampling (PoSP): An attacker falsifies PoSP data to hide malicious
actions.
Page 48
OWASP.org
-
T36 - Smart Contract Vulnerability Leading to Agent Impersonation: A smart contract vulnerability
allows attackers to impersonate agents.
T37 - Cross-Chain Bridge Attack (Indirect): An attack on a cross-chain bridge affects ElizaOS
agents.
T38 - Emergent Collusion on Blockchain: Multi ple agents interac t in a way that creates a
vulnerability or disrupts the blockchain.
This detailed threat model for ElizaOS, using the MAESTRO framework, identifies a wide range of potential
vulnerabilities, considering the specific context of blockchain integration, autonomous agents, and the four
key agentic factors (see also Figure 5).
Figure 5: Eliza Threat Model at different MAESTRO Layers
Page 49
OWASP.org
-
5. Threat Modeling Anthropic
MCP Protocol using MAESTRO
Framework
The Model Context Protocol (MCP) is an open standard developed by Anthropic to connect AI assistants with
external data sources and tools. It provides a universal interface for AI models to access relevant context
and perform actions on other systems. MCP follows a client-server architecture, allowing AI-powered
applications (clients) to seamlessly integrate with various data repositories, business tools, and
development environments (servers) through a standardized protocol. This approach eliminates the need for
custom integrations for each data source, enabling AI systems to deliver more relevant and up-to-date
responses by directly leveraging the information and tools they require.
See more details at: https://www.anthropic.com/news/model-context-protocol
Key Components of MCP
Based on the Anthropic’s MC P protocol specification, the key components of MCP include:
Component
Description
AI Application (MCP
Host)
The program, such as Claude Desktop or an IDE, initiates data access through
MCP.
MCP Client
The protocol client within the host, maintaining 1:1 connections with servers for
communication.
Responsible for invoking tools, query resources and interpolating prompts.
MCP Servers
Lightweight programs exposing specific capabilities, connecting to data sources
like files, databases, or APIs.
Responsible for exposing tools, resources and prompts.
Page 50
OWASP.org
-
Data Sources
Local or remote resources, such as computer files, databases, and external
services, accessed by MCP Servers.
Tools
Functions invoked by a model (e.g. retrieve and send information).
Resources
Data exposed to the application (e.g. Files and API responses)
Prompts
Pre-defined templates used for AI interactions (e.g. Output formatting, text
summarization tasks etc.)
These components form a client-server architecture, where the AI Application uses the MCP Client to
interact with MCP Servers, which in turn access the necessary Data Sources. The MCP Protocol standardizes
this communication, ensuring consistency and security.
User Input: The user interacts with the MCP Host (e.g., types a request into the Claude desktop app).
Model Processing: The AI Model within the Host processes the user's input.
External Resource Needed: The AI Model determines that it needs to access external data or
functionality to fulfil the user's request. For example, it might need to:
o Retrieve information from a file (via S1).
o Query a database (via S2).
o Call an external API (via S3).
Client Request: The Host uses the appropriate MCP Client (e.g., C1 for S1) to send a request to the
corresponding MCP Server. This request is formatted according to the MCP standard (using JSON
messages).
Server Execution: The MCP Server receives the request, performs the requested action (e.g.,
reading a file, querying the database, calling the API), and prepares a response.
Response: The MCP Server sends the response back to the MCP Client.
Model Integration: The MCP Client relays the response to the MCP Host, where the AI Model
integrates the information into its final output to the user.
The following Figure depicts the overall Flow of Operation for MCP (See also Figure 6):
Page 51
OWASP.org
-
Figure 6: Different Components of MCP Client / Server System
We can list the following threats by examining the diagram without diving into MAESTRO threat modelling:
Central Role of MCP Servers: The MCP Servers are critical security points. If a server is
compromised, the attacker gains access to the resources it manages.
Client-Server Communication: The security of the communication between MCP Clients and
Servers is paramount. Encryption, authentication, and message validation are essential.
Host Security: The MCP Host itself needs to be secure, as it contains the AI Model and manages the
clients.
Page 52
OWASP.org
-
Resource Protection: The local and remote resources (R1, R2, R3) must be protected with
appropriate access controls.
Multi-Client Scenario: The presence of multiple clients suggests the potential for cross-client
interference if the servers are not properly designed to isolate client interactions.
Internet Exposure: Server 3 is exposed to the internet.
We recognize that the previous approach may not have identified all potential threats. Therefore, to leverage
the full capabilities of the MAESTRO framework for threat modeling, we will begin by mapping MCP
components to its seven architectural layers. See table below:
MAESTRO Layer Mapping for Anthropic MCP
MAESTRO
Layer
Anthropic MCP Components & Features
Notes
1. Foundation
Models
- Model-agnostic design: MCP can work with
various AI models (not just Anthropic). - Model
Processing: AI model determines if external data
is needed.
MCP interfaces with foundation
models but doesn't define them. The
choice of model, and its inherent
vulnerabilities, is external to MCP
itself. However, MCP's design
enables the use of these models,
making their security relevant.
2. Data
Operations
(RAG)
- Resources: One of the three MCP primitives is
specifically for sending data context. This
strongly implies RAG-like functionality. - MCP
Servers can provide access to various data
resources.
MCP facilitates data retrieval for the
agent, making it highly relevant to
RAG-based systems. The specific
data sources and implementation
details are up to the MCP server
developer.
Page 53
OWASP.org
-
3. Agent
Frameworks
- MCP itself (specification and SDKs). - MCP
Client (intermediary managing connections). -
Communication Flow (User Input -> Model
Processing -> Client Request -> Server
Execution -> Response Generation). - JSON
messages for communication. - Tools: One of
the three MCP primitives, for function-like calls.
- Prompts: One of the three MCP primitives.
MCP is a framework for connecting
agents to external resources (data
and tools). Layer 3 is the most
directly relevant layer for analysing
MCP's core functionality and
security properties.
4. Deployment
Infrastructure
- MCP Server (HTTP listener, uses JSON-RPC and
SSE). - Initial focus on local deployments. - Plans
for remote/cloud connections.
MCP servers are deployable
components, and their deployment
environment (local, cloud, etc.) has
security implications. The
communication protocols (HTTP,
JSON-RPC, SSE) are also relevant to
this layer.
5. Evaluation &
Observability
- Logging and Monitoring: Mentioned as a
feature, but details are sparse.
MCP includes some provision for
observability, but the specifics need
to be examined further.
6. Security &
Compliance
- Controlled AI access: Emphasized in MCP's
design. - Explicit permissions for each server. -
Server runs with given privileges. - Humans "in
the loop" for sensitive operations (governance).
MCP has some built-in security
concepts (permissions, controlled
access), but their effectiveness
depends on proper implementation
and configuration.
Page 54
OWASP.org
-
7. Agent
Ecosystem
- MCP generalizes and standardizes tool
integration across different AI models and
platforms. - Designed to be model-agnostic. -
Future implications for autonomous agents and
context gathering.
MCP aims to be a standard for
connecting agents to a wide range
of resources, making it a key enabler
of a broader agent ecosystem. The
"model-agnostic" design promotes
interoperability.
For the above table, we can summarize the following key points:
Layer 3 (Agent Frameworks) is Central: Like ElizaOS, MCP is the framework, so this layer is the
primary focus for understanding its inherent security properties.
Layers 1 and 2 are Indirectly but Critically Impacted: MCP doesn't define the foundation models or
the specific RAG implementations, b ut it enables their use and provides the mechanisms for agents
to interact with them. Therefore, vulnerabilities in these layers can be amplified by MCP.
Layer 4 is about Server Deployment: The security of MCP servers (where they are deployed, how
they are configured, what network they are on) is crucial.
Layer 6 Highlights Design Principles: MCP emphasizes controlled access and permissions, but the
actual implementation details of these controls are critical.
Layer 7 Positions MCP as an Ecosystem Enabler: MCP's design goals (model-agnostic, standardized
tool integration) position it as a key component in a broader agent ecosystem.
With this layer mapping in place, we can now proceed with a detailed threat analysis, focusing on
vulnerabilities within each layer and, crucially, on cross-layer threats that exploit the interactions between
layers. The four agentic factors (non-determinism, autonomy, identity management, and agent-to-agent
communication) will be central to this analysis.
MAESTRO Vulnerability Identification: Anthropic MCP
Layer 1: Foundation Models (Indirect Impact)
T5 - Cascading Hallucination Attacks: The LLM connected via MCP hallucinates, leading to
incorrect tool use or data requests via MCP.
T11 - Unexpected RCE and Code Attacks: Attackers exploit the LLM connected via MCP
Non-Determinism (T26 - Model Instability Leading to Inconsistent MCP Requests): The LLM's
instability causes the MCP client to send inconsistent or erratic requests to the MCP server,
disrupting operations.
T1- Memory Poisoning: Attack training data.
Page 55
OWASP.org
-
Layer 2: Data Operations (RAG - Indirect but Relevant)
T17 - Semantic Drift in Connected Data Sources: Changes in the meaning of data accessed via MCP
resources lead to incorrect retrievals and agent actions. (This applies if MCP is used to access RAG
resources).
T18 - Input Manipulation for Deceptive Retrieval: An attacker crafts inputs to exploit the RAG
system accessed through an MCP server, retrieving information that supports a malicious goal.
(Again, assuming MCP connects to RAG).
T12 - Agent Communication Poisoning: Agents communicate with each other by sharing data and
knowledge.
T28 - RAG Data Exfiltration: An attacker gains access to the vector database used by the RAG, via
MCP server.
Layer 3: Agent Frameworks (MCP Itself)
T2 - Tool Misuse via MCP: An agent (or attacker) exploits the "Tools" primitive of MCP to execute
unauthorized or harmful functions on an MCP server. This is a direct application of the existing "Tool
Misuse" threat to the MCP context.
T20-Framework vulnerability via MCP : SBOM in conjunction with MCP for framework security in
agentic AI is crucial for ensuring security and integrity of AI agents. Implement SBOM for framework
security to improve the security posture of the Agent, transparency, maintenance and secure
integration.
T30 - Insecure Communication in MCP Implementation: The MCP client-server communication
(using JSON-RPC and SSE) might be insecurely implemented, lacking proper encryption or
authentication, leading to eavesdropping or message manipulation. This isn't inherent to the
protocol itself, but a potential flaw in specific implementations.
Autonomy (T39 - Unintended Resource Consumption via MCP): An agent, acting autonomously,
uses MCP to repeatedly access resources or invoke tools, leading to excessive resource
consumption (CPU, memory, network bandwidth, API calls) on the MCP server or connected
systems. This is a form of denial-of-service, but driven by the agent's autonomous behaviour.
o Example: An agent, designed to monitor a website for changes, uses MCP to repeatedly
fetch the website's content. Due to a bug or a misconfiguration, the agent enters a loop,
fetching the content far more frequently than intended, consuming excessive bandwidth
and potentially overloading the target website.
Identity Management (T40 - MCP Client Impersonation): An attacker impersonates a legitimate
MCP client to gain unauthorized access to an MCP server and its resources. This could involve
stealing client credentials or exploiting vulnerabilities in the client authentication mechanism.
o Example: An attacker obtains the credentials used by a legitimate MCP client to connect to
a server that provides access to sensitive financial data. The attacker then uses these
credentials to impersonate the client and retrieve the data.
Page 56
OWASP.org
-
Non-Determinism (T41 - Schema Mismatch Leading to Errors): The MCP schema, used to define
the structure of resources and interactions, is either ambiguous or inconsistently implemented
between the client and server. This leads to misinterpretations of data, incorrect tool invocations,
or other errors. This is a form of non-determinism arising from protocol ambiguity.
o Example: An MCP server defines a "date" field in its schema as a string, but doesn't specify
the expected format. Different MCP clients might send dates in different formats (e.g. ,
"YYYY-MM-DD", "MM/DD/YYYY"), leading to parsing errors or incorrect data interpretation.
Agent-to-Agent Communication (T42 - Cross-Client Interference via Shared Server): Multiple
MCP clients (potentially belonging to different agents or users) connect to the same MCP server. A
vulnerability in the server's implementation allows one client to interfere with the operations of
other clients, for example, by modifying shared data, hijacking sessions, or triggering unintended
actions. This is not direct agent-to-agent communication, but indirect interference via a shared
resource.
o Example: An MCP server manages access to a shared database. A vulnerability in the server
allows one MCP client to overwrite data being used by another client, leading to data
corruption or incorrect agent behaviour.
Risk of integrating a malicious MCP server that can manipulate the agent into executing unintended
actions (see: Invariant Labs - MCP Securi ty Notifi cation Tool Poisoning Atta cks).
Layer 4: Deployment Infrastructure
T4 - Resource Overload: Attackers flood the MCP server with requests, causing a denial of service.
T21 - Service Account Exposure: The MCP server's service account credentials are exposed.
T34 - Wallet Key Compromise:
T43 - Network Exposure of MCP Server: An MCP server is deployed without proper network
security controls (e.g., firewalls, access control lists), making it accessible to unauthorized clients or
attackers on the network. This is a deployment vulnerability.
o Example: An MCP server providing access to internal company data is accidentally deployed
on a public-facing network without a firewall, allowing anyone on the internet to connect to
it.
Layer 5: Evaluation & Observability
T8 - Repudiation & Untraceability: Actions performed through MCP are not properly logged or
audited.
T22 - Selective Log Manipulation: As defined before.
T44 - Insufficient Logging in MCP Server/Client: The MCP server or client implementations lack
sufficient logging, making it difficult to detect or investigate security incidents, performance
issues, or errors. This hinders observability.
Page 57
OWASP.org
-
o Example: An MCP server is compromised, and the attacker uses it to exfiltrate data.
However, because the server doesn't log the details of client requests or data access, there
is no record of the attacker's actions.
Layer 6: Security & Compliance (Vertical Layer)
T3 - Privilege Compromise: Weaknesses in access control mechanisms within the MCP server or
connected resources.
Identity Management (T45 - Insufficient Isolation of MCP Server Permissions): The MCP server
itself is granted excessive permissions on the host system or network. This means that if the server
is compromised, the attacker gains access to a wider range of resources than necessary. This is a
violation of the principle of least privilege at the server level.
o Example: An MCP server is running on a server with full administrative access to the
operating system. If the MCP server is compromised, the attacker gains full control of the
server, not just access to the resources managed by the MCP server.
T46 - Data Residency/Compliance Violation via MCP Server: An MCP server transfers data across
geographical boundaries or processes data in a way that violates data privacy regulations or compliance
requirements.
Layer 7: Agent Ecosystem
T12 - Agent Communication Poisoning: Although there's no direct agent-to-agent communication
defined in MCP itself, agents using MCP could be part of a larger system where they do
communicate. This threat would then apply.
T13 - Rogue Agents in Multi-Agent Systems: If MCP is used in a multi-agent system, a rogue agent
could exploit MCP.
Agent Identity Management (T47 - Rogue MCP Server in Ecosystem): An attacker deploys a
malicious MCP server that masquerades as a legitimate server, providing seemingly valid but
actually harmful services or data. Agents connecting to this rogue server are then compromised.
This is an ecosystem-level attack targeting the trust model of MCP.
Example: An attacker sets up an MCP server that claims to provide access to a valuable financial data feed.
Agents, believing this server to be legitimate, connect to it and send requests. The rogue server, however,
returns manipulated data or steals the agents' credentials.
Threat Model Summary
The table in the following figure summarizes what we have discovered (Figure 7).
Page 58
OWASP.org
-
Figure 7: Threat Summary of Anthropic MCP Applications using MAESTRO Threat Modelling
This detailed threat model for Anthropic's MCP, using the MAESTRO framework, identifies a wide range of
potential vulnerabilities. It highlights the importance of secure server implementations, robust client-server
communication, and careful consideration of the agentic factors (especially autonomy and identity
management) when using MCP to build and deploy AI agents. The focus on both the protocol itself and its
potential misuse provides a comprehensive view of the security landscape.
The addition of new threat IDs, starting from T39, ensures no overlap with previously defined threats.
Page 59
OWASP.org
-
References
Agentic AI Threats and Mitigations: https://genai.owasp.org/resource/agentic-ai-threats-
and-mitigations/
NIST AI Risk Management Framework: https://www.nist.gov/itl/ai-risk-management-
framework
OWASP Top 10 for LLMs: https://genai.owasp.org/llm-top-10/
Eliza Agent Framework: https://www.elizaos.ai/
Eliza Agent Framework GitHub: https://github.com/elizaOS/eliza
Agentic AI Threat Modelling Framework: MAESTRO,
https://cloudsecurityalliance.org/blog/2025/02/06/agentic-ai-threat-modeling-framework-
maestro
Page 60
OWASP.org
-
Acknowledgements
.
Contributors
Ken Huang (ASI Threat Modelling Co-Lead)
DistributedApps.ai, CSA
Akram Sheriff (ASI Threat Modelling Co-lead),
Cisco Systems
John Sotiropoulos (ASI co-lead), Kainos
Ron F. Del Rosario (ASI co-lead), SAP
Kayla Underkoffler, Zenity
Idan Habler, PhD, Intuit
Josh Devon, Sublime Strategy
Vinod Vasudevan, Ackuity.ai
Rajat Mohanty, Ackuity.ai
Murali Krishna Kandula, Apple
Mohit Yadav, Actualize
Sana Zia Hassan, EY
Vineeth Sai Narajala, AWS
Evgeniy Kokuykin, RAFT
Bill Glennon, Amentum
Riggs Goodman III, AWS
Angus Chen, Binary Defense
Ying-Jung Chen PhD, Independent Consultant
Abhaar Gupta, Anchorage Digital
Yotam Barak. Mint Security
Victor Lu
ASI Review Board
Alejandro Saucedo - Chair of ML Security Project at
Linux Foundation, UN AI Expert, AI Expert for Tech
Policy, European Commission
Apostol Vassilev - Adversarial AI Lead, NIST
Chris Hughes - CEO, Aquia
Hyrum Anderson - CTO, Robust Intelligence
Steve Wilson - OWASP Top 10 for LLM Applications
and Generative AI Project Lead and Chief Product
Officer, Exabeam
Scott Clinton - OWASP Top 10 for LLM Applications
and Generative AI Project Co-Lead
Vasilios Mavroudis- Principal Research Scientist
and Theme Lead, the Alan Turing Institute
Josh Collyer, Principal Security Researcher, Them e
Lead
Egor Pushkin, Chief Architect, Data and AI at Oracle
Cloud
Page 61
OWASP.org
-
OWASP Top 10 for LLM Project
Sponsors
We appreciate our Project Sponsors, funding contributions to help support the objectives of the project and
help to cover operational and outreach costs augmenting the resources provided by the OWASP.org
foundation. The OWASP Top 10 for LLM and Generative AI Project continues to maintain a vendor neutral and
unbiased approach. Sponsors do not receive special governance considerations as part of their support.
Sponsors do receive recognition for their contributions in our materials and web properties.
All materials the project generates are community developed, driven and released under open source and
creative commons licenses. For more information on becoming a sponsor, visit the Sponsorship Section on
our Website to learn more about helping to sustain the project through sponsorship.
Project Sponsors:
Sponsor list, as of publication date. Find the full sponsor list here.
Page 62
OWASP.org
-
Project Supporters
Project supporters lend their resources and expertise to support the goals of the project.
Accenture
AddValueMachine Inc
Aeye Security Lab Inc.
AI informatics GmbH
AI Village
aigos
Aon
Aqua Security
Astra Security
AVID
AWARE7 GmbH
AWS
BBVA
Bearer
BeDisruptive
Bit79
Blue Yonder
BroadBand Security, Inc.
BuddoBot
Bugcrowd
Cadea
Check Point
Cisco
Cloud Security Podcast
Cloudflare
Cloudsec.ai
Coalfire
Cobalt
Cohere
Comcast
Complex Technologies
Credal.ai
Databook
DistributedApps.ai
DreadNode
DSI
EPAM
Exabeam
EY Italy
F5
FedEx
Forescout
GE HealthCare
Giskard
GitHub
Google
GuidePoint Security
HackerOne
HADESS
IBM
iFood
IriusRisk
IronCore Labs
IT University Copenhagen
Kainos
KLAVAN
Klavan Security Group
KPMG Germany FS
Kudelski Security
Lakera
Lasso Security
Layerup
Legato
Linkfire
LLM Guard
LOGIC PLUS
MaibornWolff
Mend.io
Microsoft
Modus Create
Nexus
Nightfall AI
Nordic Venture Family
Normalyze
NuBinary
Palo Alto Networks
Palosade
Praetorian
Preamble
Precize
Prompt Security
PromptArmor
Pynt
Quiq
Red Hat
RHITE
SAFE Security
Salesforce
SAP
Securiti
See-Docs & Thenavigo
ServiceTitan
SHI
Smiling Prophet
Snyk
Sourcetoad
Sprinklr
stackArmor
Tietoevry
Trellix
Trustwave SpiderLabs
U Washington
University of Illinois
VE3
WhyLabs
Yahoo
Zenity
Sponsor list, as of publication date. Find the full sponsor list here.
... However, these interoperability mechanisms introduce novel security challenges [3]. Of particular concern is "tool squatting," where malicious actors deceptively register tools or misrepresent their capabilities to gain unauthorized access or impersonate legitimate services. ...
... This work is not related to the author's position at Amazon Web Services.2 This work is not related to the author's position at DistributedApp.ai3 This work is not related to the author's position at Intuit ...
Preprint
Full-text available
The rise of generative AI (GenAI) multi-agent systems (MAS) necessitates standardized protocols enabling agents to discover and interact with external tools. However, these protocols introduce new security challenges, particularly; tool squatting; the deceptive registration or representation of tools. This paper analyzes tool squatting threats within the context of emerging interoperability standards, such as Model Context Protocol (MCP) or seamless communication between agents protocols. It introduces a comprehensive Tool Registry system designed to mitigate these risks. We propose a security-focused architecture featuring admin-controlled registration, centralized tool discovery, fine grained access policies enforced via dedicated Agent and Tool Registry services, a dynamic trust scoring mechanism based on tool versioning and known vulnerabilities, and just in time credential provisioning. Based on its design principles, the proposed registry framework aims to effectively prevent common tool squatting vectors while preserving the flexibility and power of multi-agent systems. This work addresses a critical security gap in the rapidly evolving GenAI ecosystem and provides a foundation for secure tool integration in production environments.
ResearchGate has not been able to resolve any references for this publication.