In previous parts of this series on Rogue AI, we briefly explored what organisations can do to better manage risk across their AI attack surface. And we touched on ways to mitigate threats by creating trusted AI identities. We’ve also cited the great work that MIT is doing to collect AI risks and that OWASP is doing to suggest effective mitigations for LLM vulnerabilities.
Now it’s time to fill in the missing pieces of the puzzle by describing how Zero Trust and layered defences can secure against Rogue AI threats.
Rogue AI Causal Factors
LLM Vulnerability / Type of Rogue | Accidental | Subverted | Malicious |
Excessive Functionality | Misconfiguration of capability or guardrails | Capabilities modified or added directly, or guardrails evaded | Functionality required for malicious goals |
Excessive Permissions | Misconfiguration of authorisation | Privileges escalated | Must acquire all privileges; none to start |
Excessive Autonomy | Misconfiguration of tasks requiring human review | Human removed from the loop | Not under defender control |
The Causal factors above can be used to identify and mitigate risk associated with Rogue AI services. The first step is to properly configure the relevant AI services, which provides a foundation of safety against all types of Rogue AI by specifying allowed behaviours. Protecting and sanitising the points where known AI services touch data or use tools primarily prevents Subverted Rogues, but can also address other ways accidents happen. Restricting AI systems to allowed data and tool use, and verifying the content of inputs to and outputs from AI systems forms the core of safe use.
Malicious Rogues can attack your organisation from the outside or act as AI malware within your environment. Many patterns used to detect malicious activities by cyber attackers can also be used to detect the activities of Malicious Rogues. But as new capabilities enhance the evasiveness of Rogues, learning patterns for detection will not cover the unknown unknowns. In this case, machine behaviours need to be identified on devices, in workloads and in network activity. In some cases, this is the only way to catch Malicious Rogues.
Behavioural analysis can also detect other instances of excessive functionality, permissions or autonomy. Anomalous activity across devices, workloads, and network can be a leading indicator for Rogue AI activity, no matter how it was caused.
Comprehensive defence across the OSI communications stack
However, for a more comprehensive approach, we must consider defence in depth at every layer of the OSI model, as follows:
Physical: Monitor processor use (CPU, GPU, TPU, NPU, DPU) in cloud, endpoint and edge devices. This applies to AI-specific workload patterns, querying AI models (inference), and loading model parameters into memory close to AI-specific processing.
Data layer: Use MLOps/LLMOps versioning and verification to ensure models are not poisoned or replaced, recording hashes to identify models. Use software and AI model bills of materials (SBoMs/MBoMs) to ensure the AI service software and model can be trusted.
Network: Limit AI services that can be reached externally as well as the tools and APIs that AI services can reach. Detect anomalous communicators such as human-to-machine transitions and novel machine activity.
Transport: Consider rate limiting for external AI services and scanning for anomalous packets.
Session: Insert verification processes such as human-in-the-loop cheques, especially when instantiating AI services. Use timeouts to mitigate session hijacking. Analyse user-context authentications and detect anomalous sessions.
Application and Presentation layers: Identify misconfiguration of functionality, permissions and autonomy (as per the table above). Use guardrails on AI inputs and outputs, such as scrubbing of personal (PII) and other sensitive information, offensive content, and prompt injections or system jailbreaks. Restrict LLM agent tools according to an allow list which limits APIs and plugins and only allows well-defined use of well-known websites.
Rogue AI and the Zero Trust Maturity Model
Zero Trust security architecture provides many tools to mitigate Rogue AI risk. The Zero Trust Maturity Model was created by the US Cybersecurity and Infrastructure Security Agency (CISA) to support federal agency efforts to comply with Executive Order (EO) 14028: Improving the Nation’s Cybersecurity. It reflects the seven tenets of zero trust as outlined in NIST SP 800-207:
- All data sources and computing services are considered resources.
- All communication is secured regardless of network location.
- Access to individual enterprise resources is granted on a per-session basis.
- Access to resources is determined by dynamic policy.
- The enterprise monitors and measures the integrity and security posture of all owned and associated assets.
- All resource authentication and authorisation are dynamic and strictly enforced before access is allowed.
- The enterprise collects as much information as possible about the current state of assets, network infrastructure, and communications and uses it to improve its security posture.
Effective risk mitigation in a Rogue AI context requires organisations to reach the “advanced” stage described in the CISA document:
“Wherever applicable, automated controls for lifecycle and assignment of configurations and policies with cross-pillar coordination; centralised visibility and identity control; policy enforcement integrated across pillars; response to pre-defined mitigations; changes to least privilege based on risk and posture assessments; and building toward enterprise-wide awareness (including externally hosted resources).”
The pillars of Zero Trust loosely map to layers of the OSI model, which provides a more structured approach to specifying levels in a communication stack rather than pillars that need protection. Governance of allowed actions for each pillar is the most basic protection against Rogue AI; only do what is specifically allowed. Automation and orchestration of controls, access, reputation, and pattern detection provides the next level of defence. Visibility across the Zero Trust Maturity Model Pillars for normal and anomalous behaviour allows detection of unknown unknowns. Rogue AI may be mitigated by strict policy, other rule and ML activity detection, and in the end by visibility into actions that were not normal, not intended and therefore misaligned.
Layered defence and defence across layers
The first step in the long road to AI safety is policy compliance: allowing only what is specified, ensuring AI is authorised within a strictly controlled sandbox, limiting creation of new instantiations for AI service use, new types of data access or tools use. Also ensure AI has its own authorisation, limited by both the identity of the AI and of the user. It must be well-defined, so that all inputs and outputs are well understood.
Reputation is also important; in fact, it is the fastest way to detect known bad or good things. This requires tracking AI identity. Different identities will have different capabilities and reputations related to how they may go rogue. Effort will need to be made to understand whether they are likely to be deceptive or provide dangerous instructions.
Rule and ML pattern based detection provides a next layer of defence. Data content identification prevents sensitive information disclosure, prompt injections, and model poisoning. Network based detections indicate AI service traffic. Simple rules on device usage and processes alert on unintended use.
Zero Trust means that anything known shall be verified; anything unknown is distrusted. Behavioural anomaly has long been challenging in security, often producing false positives. Providing corroborating information increases certainty in anomaly detections, whether from reputation, rules, patterns or other behavioural anomalies. Anomaly detection is the final layer of protection, enabling identification of previously unknown activities. Rogue AI may act without precedent. Consider setting up automated AI behaviour monitoring for each of the five Zero Trust pillars:
- Identity: Human and machine identity and behaviour.
- Devices: Physical and virtual asset awareness.
- Networks: Allowed, prohibited and anomalous communications.
- Applications and Workloads: Trusted and untrusted API/compute.
- Data: Content-aware protections such as data loss prevention, data security posture management (DSPM) and data detection and response (DDR).
Consider the above in addition to, rather than as a replacement for, the OWASP mitigations. With a rigorous, systematic approach to mitigating risk across OSI layers and Zero Trust pillars, organisations can start to build out their AI programmes with confidence that the AI they use is aligned with their goals.
To read more about Rouge AI: