What is the difference between a business continuity exercise and a test?

An exercise validates whether people, plans, and dependencies hold together under a simulated disruption, producing findings to act on. A test checks whether one specific component meets a defined threshold, such as restoring from backup within the recovery time objective, and resolves to pass or fail. ISO 22398:2013 formalises this distinction; most programs need both.

What is the difference between a tabletop exercise and a full simulation?

A tabletop is discussion-based: the team talks through a scenario with no live systems activated, which is ideal for testing decisions, escalation, and coordination cheaply. A full-scale simulation activates real systems, people, and timing under realistic pressure, proving recovery rather than discussing it. Simulations cost far more and are reserved for material services and mature programs.

How often should we conduct business continuity exercises?

DORA sets a floor of at least yearly testing for ICT systems supporting critical functions, but an annual exercise is usually too infrequent to build genuine readiness. Mature programs run more frequent, lighter-touch exercises and reserve full-scale simulations for material services, because cadence and follow-through matter more than the polish of any single event.

What are the regulatory requirements for BC exercises under ISO 22301 and DORA?

ISO 22301:2019 Clause 8.5 requires a regular, evaluated exercise programme with defined objectives and criteria. DORA Articles 24-27 require financial entities to maintain a digital operational resilience testing programme with at least yearly testing of ICT supporting critical functions, plus threat-led penetration testing for those identified as critical. FCA/PRA and APRA CPS 230 add severe-but-plausible scenario testing obligations.

How do I design a realistic tabletop exercise scenario?

Start from your BIA's critical activities and their dependencies rather than a generic threat list, then choose a scenario that stresses your specific single points of failure. Build injects that contradict and escalate rather than just advancing time, and define evaluation criteria before the exercise runs. A scenario everyone survives comfortably was probably not severe enough.

What should be included in an after-action report following an exercise?

An After-Action Report should capture what was observed against the predefined evaluation criteria, the decisions made and where they faltered, and the specific gaps surfaced. It pairs with an Improvement Plan that assigns every finding a named owner and a deadline. The improvement plan, not the report, is the real deliverable, because an unclosed finding is an open risk.

How do we involve third-party vendors in our exercise program?

Exercise the scenario where a critical vendor goes dark, since concentration risk, one provider underpinning many services, is what regulators most fear. APRA CPS 230 and DORA both require material service provider disruption to be tested explicitly. Where possible, involve key vendors directly so dependencies between their response and yours are validated rather than assumed.

How do we progress from basic exercises to full-scale simulations?

Move new arrangements up the spectrum in steps: walkthrough for familiarity, tabletop for decisions and coordination, functional to activate part of the response, then full-scale to prove recovery under real pressure. Justify each step by what the previous one surfaced. Reserve the high-cost full-scale format for material services and reasonably mature programs.

Exercise & Simulation for Business Continuity: The Complete Guide

The exercise went perfectly. Everyone knew their role, the plan worked exactly as written, and the room felt reassured. Which is precisely why it taught nobody anything about how the organization would actually behave when a real crisis arrived and refused to follow the script.

That scene plays out in board rooms and ops centres every quarter. A facilitator reads a scenario off a slide, the team recites the plan, an After-Action Report records zero findings, and the audit box gets ticked. The program looks healthy on paper while operational readiness stays exactly where it was. The rest of this guide unpacks why most exercise programs drift into theater, what makes a genuine stress test different from a recitation, and how to build a program that finds your weaknesses before an incident does.

Why most exercises teach nothing

Most exercise programs don't fail because teams skip them. They fail because the events are designed, consciously or not, to confirm what everyone already believes rather than to break it. A scripted, low-pressure walkthrough produces a clean report and a comfortable room, and both are mistaken for readiness.

The exercise that goes perfectly and proves nothing

When every participant knows the answer before the question is asked, you are not testing the response. You are rehearsing a performance. The scenario advances on rails, each team member delivers their line, and the facilitator nudges the conversation back on track whenever it threatens to get interesting.

Compliance-driven exercises optimise for the artefact, not the insight. The goal becomes a tidy After-Action Report with no awkward gaps to explain to an examiner. So the design quietly removes anything that might generate an awkward gap. That is how a program can run four exercises a year, pass every audit, and still discover during a live incident that nobody had authority to invoke the third-party recovery contract.

A "successful" exercise that surfaces zero findings is usually a design failure, not a sign of maturity. The real measure is whether decisions and dependencies were stressed hard enough to break. If nothing broke, you tuned the difficulty down too far. Most programs know this implicitly. It rarely gets named out loud, because naming it means admitting the last three exercises proved very little — and that the next audit cycle is going to need a different design.

What is exercise & simulation in business continuity?

Exercise and simulation is the practice of subjecting people, plans, and dependencies to a simulated disruption to validate whether they hold together, and to find the weaknesses that would otherwise surface only during a real incident. An exercise validates arrangements end to end; a test checks whether one specific component works. Both belong inside a mature continuity program.

Business continuity itself is a holistic process. It begins with a Business Impact Analysis (BIA), informs the development of business continuity plans and recovery strategies, and helps an organization maintain critical operations during and after disruption. Exercising is the validation step that closes that loop. Without it, the plans produced from a BIA remain assumptions nobody has pressure-tested. This sub-pillar sits within the broader discipline of enterprise resilience, alongside business continuity, disaster recovery, and crisis management.

Exercise vs. test: a distinction that matters

The terms get used interchangeably, and the imprecision causes real problems when a regulator asks what your program actually validates. ISO 22398:2013, the international guideline for exercises, formalises the distinction: an exercise is a process to train, assess, and improve performance, while a test introduces a pass/fail expectation against a defined component.

Attribute	Exercise	Test
What it validates	People, plans, and dependencies together	A single component or capability
Outcome	Findings and improvement actions	Pass or fail against a threshold
Example	Tabletop on a ransomware scenario	Restore from backup within the RTO
Failure meaning	Surfaces a gap to close	Component does not meet spec

NIST SP 800-84, the US guide to test, training, and exercise programs, draws the same line and supplies clean definitions for tabletop versus functional exercises that the rest of this guide leans on. Both terms matter. Confusing them is how a program ends up claiming it "tested" its continuity arrangements when all it really did was confirm a backup job completed.

What an exercise is actually for

The primary purpose is unglamorous: find weaknesses before a real incident exploits them. The secondary purpose is building muscle memory and clarifying who decides what under pressure, which is exactly the knowledge that evaporates when an incident starts and adrenaline takes over.

Exercises validate the plans produced from a BIA. They do not replace the analysis underneath. ISO 22301 Clause 8.5 requires organizations to establish an exercise programme that validates business continuity arrangements over time, with defined objectives and evaluation criteria. An exercise that generates no follow-through findings is incomplete by the standard's own logic, because it has validated nothing and improved nothing.

The exercise spectrum: From walkthrough to full-scale simulation

Exercises range from a low-cost discussion around a table to a high-fidelity simulation that activates real systems under real time pressure. Each format answers a different question, and matching the format to the question is the first design decision a lead makes. Run a walkthrough when you need familiarity. Run a full-scale simulation when you need proof.

Orientation and walkthroughs

A walkthrough is a low-pressure familiarisation: the team reads through a plan, confirms roles, and checks that contact lists and call trees are current. It is genuinely useful for onboarding new responders and for a first-pass review of a freshly written plan.

It is a weak stress test, and treating it as one is a common mistake. A walkthrough confirms people can find the plan and read it. It says nothing about whether they can execute it when the building is dark and the named decision-maker is on a plane. Use it as a precursor to a tabletop, not a substitute for one.

Tabletop exercises

A tabletop exercise is discussion-based and scenario-driven. No live systems are activated. The team works through an escalating situation, narrating what they would do, while a facilitator introduces complications. Its strength is testing decision-making, escalation paths, and cross-team coordination cheaply and quickly.

The quality of a tabletop depends almost entirely on inject design and facilitation, not on the production values of the slide deck. A beautifully formatted scenario that never contradicts the team's first assumption teaches less than a rough one that does.

Functional and full-scale simulations

A functional exercise simulates an operational environment and activates part of the response: the crisis team stands up, systems are partially invoked, and timing starts to matter. A full-scale crisis simulation goes further, testing real people, real systems, and real timing under realistic pressure. It is the only format that proves recovery rather than discusses it.

These are the highest-cost, highest-realism formats, and you reserve them for material services and reasonably mature programs. The payoff is that they surface the things discussion never reaches. When DP World Australia was forced into manual operations across four ports after a November 2023 cyberattack, roughly 30,000 containers were stranded while the operator ran the docks by hand for days. That is precisely the manual-fallback scenario a functional simulation is built to rehearse: can the operation continue when the systems that normally run it are gone? FEMA's HSEEP doctrine provides the design and conduct methodology for exercises at this scale.

The four design choices that make an exercise real

The format matters less than four design choices: pressure, ambiguity, dependency stress, and unscripted injects. This is the argumentative core of the guide. Get these four right in a modest tabletop and you will learn more than from an expensive simulation that runs on rails.

Unscripted injects that break assumptions

An inject is new information delivered mid-exercise. Used well, a single inject that invalidates the team's planned approach reveals more than the entire scripted timeline before it. The team has committed to a course of action; now the supplier they were relying on has just declared its own outage. What happens next is the real exercise.

Injects should escalate and contradict, not merely advance the clock. The weakest injects move time forward ("it is now hour four"). The strongest ones remove an option the team assumed was available. HSEEP's evaluation and improvement methodology depends on injects that force genuine decisions, because the After-Action Report can only be as good as the decisions there were to observe. If your injects never make anyone uncomfortable, your report will never tell you anything new.

Decision authority under ambiguity

Most plans name a decision-maker and quietly assume that person is reachable. Crises rarely cooperate. The single most revealing inject in a continuity exercise is to declare a key person "unavailable" halfway through and watch what happens to the decision.

Who invokes the plan when the named invoker is unreachable? Who signs off on a recovery spend at 2am? Who talks to the regulator? Ambiguity over who decides what is the failure mode that almost never appears in a plan review and almost always appears in a real incident. This creates a dangerous false confidence: the documented chain of command looks complete right up until the moment it depends on someone who can't be reached.

Stressing real dependencies

A good exercise surfaces dependencies the team did not know it had, especially third-party and system dependencies buried two layers deep. The instruction that matters is simple: test the dependency, not the assumption that the dependency works.

The July 2024 CrowdStrike outage is the cleanest recent illustration of why. A faulty Channel File 291 content update to the CrowdStrike Falcon sensor, pushed at 04:09 UTC on 19 July 2024, sent Windows endpoints into boot loops worldwide. CrowdStrike reverted the file within roughly 78 minutes, but recovery required manual, machine-by-machine intervention. Around 8.5 million Windows systems crashed; airlines grounded fleets, hospitals reverted to paper, and payment systems failed. Fortune 500 firms absorbed an estimated $5.4 billion in direct losses. The cascade ran precisely through dependencies almost nobody had exercised, because the security agent that bricked the machines was the same kind of trusted, ubiquitous component continuity plans rarely think to question. The takeaway for exercise design is blunt: the dependency you never simulate is the one that takes you down.

Choosing Scenarios that matter for your organization

Scenario selection should be driven by your actual risk profile and critical dependencies, not a generic catalogue of disasters. A scenario the whole industry runs is rarely the scenario that would break your specific operation. The discipline is to start from your own single points of failure and work outward.

From generic threat lists to your critical dependencies

Start from the BIA's critical activities and the dependencies that support them, then pick scenarios that stress those specific points. If a single vendor underpins three important services, that concentration is your scenario, not a generic "cyberattack." The critical dependency mapping work that feeds your BIA is where realistic scenarios come from.

The realistic categories most programs return to are ransomware, key supplier failure, facility loss, and prolonged technology outage. Each maps to a different recovery pathway. The BCI Horizons Scan Report 2025 tracks which of these dominate disruption experience year to year, which is a useful sense-check against your own assumptions. It is no substitute for knowing where your own operation is brittle.

Severe but plausible: meeting the regulatory bar

UK and Australian regulators have settled on the same phrase: severe but plausible. PRA SS1/21 Chapter 6 requires firms to test their ability to remain within impact tolerances under severe-but-plausible disruption, and FCA PS21/3 sets the same expectation for important business services.

The operational implication is uncomfortable and correct: a scenario everyone survives comfortably probably wasn't severe enough. If the team reaches the end of the exercise with the impact tolerance intact and no hard choices made, the scenario validated nothing about the boundary you actually need to defend. Severe-but-plausible means pushing right up to the edge of the tolerance and seeing whether the response holds.

Regulatory requirements for exercising across jurisdictions

For regulated entities, exercising is now a defined obligation rather than good practice, and the requirements differ in ways that matter for how you design the program. A financial services lead operating across the EU, UK, and Australia is answering to three different testing regimes at once. Mapping the obligations by clause is how you defend the program's design when an examiner asks.

Read our blog on regulations and standards for business continuity and operational resilience

ISO 22301 and the baseline exercise programme

The ISO 22301 evaluation requirement under Clause 8.5 sets the baseline: a regular, evaluated exercise programme that involves relevant stakeholders, exercises against scenarios consistent with continuity objectives, and produces documented results that drive improvement. It is deliberately outcome-focused rather than prescriptive about format.

ISO 22398 supplies the planning and improvement methodology underneath that requirement, and the DRI Professional Practices align exercising with the broader continuity lifecycle so that findings feed back into plans rather than sitting in a report. Together they give a non-regulated organization a defensible structure even without a regulator forcing the issue.

DORA, FCA/PRA, and APRA: financial services obligations

Financial services carries the heaviest testing load, and the three major regimes do not align neatly.

Regime	Core testing obligation	Frequency / deadline
EU DORA	Digital operational resilience testing programme; threat-led penetration testing (TLPT) for critical functions	At least yearly for ICT supporting critical functions
FCA / PRA (UK)	Test ability to stay within impact tolerances for important business services under severe-but-plausible scenarios	Full mapping and testing compliance by 31 March 2025
APRA CPS 230 (AU)	Test BCPs against severe-but-plausible scenarios, including material service provider disruption	Effective 1 July 2025

DORA Articles 24-27 require financial entities to maintain a testing programme and, for those identified as critical, to undertake threat-led penetration testing; the EBA's interactive rulebook details the Chapter IV testing expectations. The Australian CPS 230 standard is explicit at paragraphs 40-46 that the testing programme must include scenarios involving disruption to material service providers, a direct response to the concentration risk regulators most fear. The FCA's own operational resilience insights make clear that examiners want to see testing plans that evolve, not a single annual event repeated unchanged.

The exercise lifecycle: design, facilitate and capture

A rigorous exercise follows a disciplined lifecycle with defined inputs, analytic steps, artefacts, and decisions, the same discipline you would apply to any high-stakes analysis. Making the methodology explicit is what turns a program from a sequence of one-off heroics into something repeatable.

Inputs and analytic steps

The inputs are concrete: BIA outputs, a current critical dependency map, prior After-Action Report findings, and the organization's risk profile. Without these, scenario design defaults to guesswork dressed as judgement.

The analytic steps follow in order:

Set exercise objectives tied to specific continuity arrangements you need to validate.
Build the scenario and the injects, designing the injects to contradict and escalate.
Define evaluation criteria before the exercise runs, not after.
Brief facilitators on tempo, inject timing, and what "good" looks like for each objective.
Confirm participants and their stand-ins, so an absence becomes a deliberate inject rather than an accident.

The evaluation criteria point is not optional polish. The Clause 8.5 requirement is that exercises be evaluated against defined criteria, which means you decide what success and failure look like before the room can talk you into grading on a curve.

Facilitation under pressure

The facilitator's job is to control tempo, deliver injects on time, and resist the powerful urge to rescue the room. When the team stalls, the instinct is to drop a hint. The discipline is to let the silence run, because the silence is data.

Good facilitation provokes real decisions instead of keeping the timeline tidy. Capture observations live, with a dedicated scribe, because memory degrades within hours and the specific phrasing of a flawed decision is exactly what the After-Action Report needs. The HSEEP conduct guidance is built around this separation of facilitation from evaluation for a reason.

Artifacts and decisions: the After-Action Review

The artefacts are an After-Action Report and an Improvement Plan, the latter with named owners and hard deadlines against every finding. A finding without an owner is a wish.

The decisions are where the value is realised: which findings change plans now, which require investment and a business case, and which feed the design of the next exercise. The same HSEEP AAR/IP methodology treats the improvement plan, not the report, as the real deliverable. An unclosed finding is not a record of diligence. It is an open risk you have documented and chosen to carry.

Building a maturing exercise program over time

A single good exercise is luck. A program is design. Maturity comes from cadence, deliberate progression up the format spectrum, and a closed loop back into plans, not from the polish of any one event. The programs that improve are the ones that treat each exercise as an input to the next.

See our explanation on why most business continuity exercises measure on activity and not readiness.

Cadence and progression

DORA Article 24 sets a floor of at least yearly testing for ICT systems supporting critical functions, and for many programs that annual cadence is exactly the problem. An annual tabletop is enough to satisfy an examiner and far too little to build readiness.

A maturing program moves new arrangements up the format spectrum in deliberate steps:

Walkthrough to confirm the plan exists, roles are understood, and contact data is current.
Tabletop to test decisions, escalation, and coordination on a contested scenario.
Functional exercise to activate part of the response and introduce real timing pressure.
Full-scale simulation to prove recovery on material services under realistic conditions.

Each step has to be justified by what the previous one surfaced. Frequency and follow-through beat production values every time, a point developed further in the analysis of why exercise programs measure activity rather than readiness.

What inadequate testing actually costs

The price of skipping rigorous testing is documented in regulatory enforcement. TSB Bank's April 2018 migration of 5.2 million customers onto a new core banking platform failed on day one and disrupted services into December. The FCA and PRA found the root cause to be an overly ambitious timetable and inadequate testing. The episode cost TSB around £330 million and 80,000 customers, and the regulators jointly fined the bank £48.65 million for the operational and governance failings.

British Airways tells the same story from the recovery angle. In May 2017, a power supply at a Heathrow data centre was disconnected and then reconnected in a way that sent an uncontrolled power surge through the servers, physically damaging them. Backup systems did not pick up the load. Over 700 flights were cancelled, 75,000 passengers were stranded, and IAG put the cost at £80 million. Untested recovery is not recovery. Alongside CrowdStrike's 2024 cascade, these older cases carry the same modern lesson: the failover you never exercise is a failover you do not have.

Industry-specific nuance: financial services and manufacturing

Exercise priorities diverge sharply by sector. Financial services is driven by impact-tolerance regulation and third-party concentration; manufacturing centres on physical operations and supply chain continuity. The general methodology holds in both, but the scenarios that matter look nothing alike.

Financial services: impact tolerances and third-party concentration

In financial services, the exercise has to validate one specific thing above all: that the firm can stay within its impact tolerances for important business services under a severe-but-plausible scenario. The PRA's testing chapter makes that the test, and the broader operational resilience for financial services picture pulls DORA, FCA, and CPS 230 into a single program.

Third-party disruption has to be exercised explicitly, not assumed away. The Australian prudential standard on operational risk requires scenarios involving material service provider failure precisely because concentration is the systemic worry: one cloud provider, one payment processor, or one core banking vendor sitting underneath many critical services. The scenario regulators want exercised is the one where that single dependency goes dark.

Manufacturing: physical operations and supply chain failure

Manufacturing scenarios cluster around facility loss, key supplier failure, and prolonged operational outage. The defining question is whether production can continue when the systems that normally run it are unavailable — the manual-fallback problem.

DP World Australia's November 2023 forced reversion to manual port operations is the reference case. Stevedores ran four ports by hand while roughly 30,000 containers backed up, and the operation held only because manual processes existed and people knew them. Most manufacturers assume that fallback works without ever simulating it. The BCI Horizon Scan consistently ranks supply chain disruption among the threats operators most expect, which makes the unsimulated manual fallback an expensive assumption to leave untested.

Where intelligent tooling fits

Spreadsheets and slide decks make exercises one-off and findings easy to lose. Tooling earns its place where it makes scenarios more dynamic and findings impossible to drop, not where it adds another dashboard nobody opens. The test for any platform is whether it improves the two things that actually matter.

See Fortivs exercise and simulation module

From static decks to dynamic, tracked exercises

The first thing good tooling enables is unscripted injects and live finding capture, so the facilitator can introduce a contradiction on the fly and the scribe can record decisions as they happen rather than reconstructing them later. The second, and more important, is closing the loop: findings tracked to named owners and pushed back into the plans they affect, so an exercise changes the documents instead of just generating a report.

This is also where AI tabletop exercises are starting to change the economics of frequency, by generating tailored scenarios and injects fast enough to run light-touch exercises more often. The value is the same one this guide has argued throughout: surface real gaps, capture them, and close them before a real incident forces the issue.