Threat Intelligence Briefing — Part 2 of 2

Your TLPT Scope Is Drawn Around the Wrong Thing

AI and operational resilience. Threat-led testing.

In Part 1 we argued that DORA does not have an AI problem — its technology-neutrality is sound, and the real gap is the quality of threat assessment feeding the ICT risk-management framework. This part makes the same argument one layer down, at the sharp end: threat-led penetration testing.

TIBER-EU was substantially rewritten in February 2025 to align with DORA’s regulatory technical standards on threat-led penetration testing. The update was real and material which required deliverables, tighter timelines, mandatory purple-teaming, stricter procurement standards for threat intelligence and red-team providers. It says nothing specific about AI.

The expected complaint follows: the framework has not kept pace. We will argue the opposite. The latitude to test AI properly is already in the framework. The problem is that the test is being scoped, intelligence-gathered, and crewed as though it were a few years ago. and potentially the most exposed part of the critical function is now sitting outside the scope spec entirely.

Key Judgements
  • High Confidence TIBER-EU’s February 2025 update aligned the framework to DORA TLPT but added nothing AI-specific, and it did not need to. The framework’s existing scoping, threat-intelligence, and scenario phases already accommodate AI components — the constraint is practitioner behaviour, not the framework text.
  • Moderate Confidence The first failure point is scoping, and it is inconsistent rather than universal. TLPT scope specifications are still commonly drawn around systems and services, while the critical or important functions they support increasingly depend on models, agents, and pipelines. Where AI components are scoped in, it tends to be driven by an individual practitioner rather than the framework — which makes the coverage uneven and the gap, where it exists, predictable.
  • High Confidence The threat-intelligence phase is the highest-leverage place to fix this. A Targeted Threat Intelligence report which does not characterise AI on both axes — adversary capability and the entity’s own AI as a target surface — is an incomplete intelligence product in 2026.
  • Moderate Confidence Live testing of production AI components creates rules-of-engagement problems the framework has not yet solved. Some AI-component attacks cannot be safely run against production the way conventional TLPT actions are, and a controlled or replica carve-out will be needed.
  • Mod–High Confidence A red team certified for conventional TLPT may have no adversarial-ML capability whatsoever, and current procurement standards do not reliably surface that gap.

The framework is not behind

It is worth being precise, because the “TIBER-EU needs an AI update” line will be everywhere within the year, suspected.

TIBER-EU does not tell a red team which technologies to attack. It defines a process and scoping around critical or important functions, along with a threat-intelligence phase that produces targeted intelligence and threat scenarios, a red-team phase that executes those scenarios against live targets, and a closure phase. It sets the deliverables and the rules of engagement for that process. Nothing in that process prohibits an AI component from being inside the scope, being characterised in the intelligence product, or being the target of a scenario.

So the framework is not the blocker. The way the framework is being used is the blocker. Three places, specifically.

Failure 1. The scope is drawn around the wrong object

A TLPT scope specification is built around critical or important functions and then decomposed into the systems and services that support them. That decomposition is a habit formed in an era when “the system that supports the function” was a well-defined thing with a hostname and an owner.

It is no longer that clean. The critical function increasingly is, or critically depends on, a model: a customer-facing agent handling servicing requests, a model in a credit or fraud decision path, a RAG pipeline that retrieval-augments a knowledge-driven process. When the scoping workshop decomposes the function into supporting systems, the model frequently does not appear as a discrete, attackable component. It is folded into “the application,” or treated as a vendor black box and waved past, or simply not recognised as a thing that can be attacked on its own terms.

The result, where this happens, is a test that is scoped with full rigour around the perimeter, the identity stack, and the infrastructure and that steps around the component most likely to fail in a novel way. The castle is tested. The crown jewels have been quietly reclassified as furniture. We are not claiming this is universal better-resourced firms are starting to catch it but it is common enough, and predictable enough in its cause, to be worth designing out rather than hoping against.

This is fixable today, with no framework change, by one discipline: when decomposing a CIF, explicitly enumerate AI components, models, agents, orchestration, retrieval pipelines, training and fine-tuning surfaces as named, in-scope, separately attackable elements, and force the question of whether each is in or out on the record.

Failure 2 - The threat-intelligence phase is single-axis

This is the part of the engagement where the gap is largest and the leverage is highest, and it is squarely an intelligence problem rather than a red-team one for sure.

The Targeted Threat Intelligence report drives the whole test — it justifies the threat actors selected, it shapes the scenarios, it is the analytical spine of the engagement. A TTI report produced in 2026 that does not address AI is not neutral. It is incomplete, and a competent test lead should treat it as a deficient deliverable.

“Addressing AI” in a TTI report means two axes, not one.

The first axis is AI as adversary capability — the uplift AI gives the threat actors you are already modelling. Deepfake voice and video against your authentication and authorisation flows. AI-accelerated reconnaissance, target development, and exploit production compressing the timeline of the intrusion you are simulating. This axis is an evolution of work the TI phase already does; most reports treat it thinly, if at all.

The second axis is the entity’s own AI as a target surface — and this one is usually absent entirely. It asks what an adversary would do to your AI systems: prompt-inject the customer-facing agent into exfiltrating data or misusing its tools; poison a retrieval corpus or a fine-tuning set; extract or abuse a model endpoint; subvert an agent’s tool-authorisation so that legitimate-looking actions carry illegitimate intent. This is not exotic. The techniques are documented, the tooling exists, and the taxonomies such as MITRE ATLAS, the OWASP work on LLM and agentic risk are mature enough to anchor a scenario the way ATT&CK already anchors conventional TLPT scenarios.

A TI provider can write both axes into a TTI report today. The framework asks for targeted threat intelligence and threat scenarios; it does not constrain what the threat is. The reason most reports do not include this is not framework text. It is that the intelligence work is harder and the analyst pool that can do it credibly is thin. That thinness is a market gap, not a regulatory one.

Failure 3. Scenarios, rules of engagement, and crew

Once AI components are in scope and the intelligence supports it, the red-team phase has to be able to actually execute against them — and here the framework genuinely will need supplementary guidance.

TLPT is a live test against production. That model works for conventional actions. It works less well for some AI-component attacks. Adversarial inputs against a production model can degrade service for real customers. A poisoning attempt against a live retrieval corpus is not cleanly reversible. The framework’s rules of engagement assume actions that are controlled, contained, and recoverable; a subset of meaningful AI attacks are not. The likely resolution is a controlled or replica carve-out — testing destructive or non-reversible AI techniques against a faithful replica while keeping non-destructive techniques live — and that distinction needs to be written into engagement guidance rather than improvised per test.

There is also a crewing problem the procurement standards do not catch. The February 2025 update tightened the requirements for threat-intelligence and red-team providers — proven expertise, financial-sector experience, conflict-of-interest controls. None of those criteria surface whether the red team can actually attack a model. Adversarial machine learning is a distinct specialism. A team with an exemplary conventional TLPT record can have precisely zero capability against an LLM agent, and the current procurement framing would not reveal it. Recognising adversarial-ML competence as a named specialism required when AI components are in scope is a sensible Level 3 addition.

The 2025 update made purple-teaming mandatory, and purple-teaming is the natural vehicle for exactly this. Detection engineering for AI-targeted attacks barely are in their infancy and still maturing, prompt injection, agent abuse, and model misuse rarely are new signals that use cases have to be built for. Running AI attack paths as purple-team exercises does double duty: it tests the control and it builds the detection capability in the same motion. If you do nothing else from this piece, route your first AI scenarios through the purple-team channel.

What to actually do

  1. Fix the scope first. Enumerate AI components when decomposing every CIF and decide their scope status on the record. A model that is never named is never tested.
  2. Demand a two-axis TTI report. Adversary AI capability and your AI as a target surface. Treat a single-axis report as deficient.
  3. Pre-agree the rules of engagement for AI components. Decide what runs live and what runs against a replica before the red-team phase, not during it.
  4. Check the crew, not just the credential. If AI components are in scope, confirm adversarial-ML capability explicitly. The standard procurement criteria will not do it for you.
  5. Use the purple team. It is mandatory now, it is the right vehicle, and it builds the detection capability you currently lack.

The point, across both parts

DORA and TIBER-EU were both built to be technology-neutral, and both succeeded. Neither has an AI gap in the sense the market is selling. What both have is a quieter and more uncomfortable problem: the regulation and the framework give you all the room you need to address AI properly, and the binding constraint is the intelligence discipline applied inside that room.

The firms that treat this as a Brussels problem will wait for an update, tick a box when it arrives, and remain exactly as exposed. The firms that treat it as an intelligence problem will scope the model, characterise the adversary on both axes, and test the thing most likely to fail. The framework will not tell you which firm to be. That decision, like the threat assessment itself, sits with you.

ThreatInsights provides cyber threat intelligence to financial-services firms, with a focus on DORA operational resilience and TIBER-EU threat-led testing. If your next TLPT scope does not yet name your AI components, that is the conversation worth having before the scoping workshop, not after it.

Leave A Comment

Name*
Message*

Download the course syllabus