Auditing AI Without Illusions

What the UK Assurance Roadmap Nails & What’s Missing

Sep 13, 2025

DSIT’s trusted third-party AI assurance roadmap is a policy paper to raise the quality and supply of independent AI assurance in the UK. It names four blockers: quality, skills, information access, innovation, and lays out near-term government actions to address them. It’s market-shaping, but there's no talk of product regulation.

The roadmap explores 3 levers to improve market quality:

Professionalisation: recognised credentials for people
Certification of assurance processes/methods
Accreditation of firms within the UK’s quality-infrastructure ecosystem (UKAS).

Concrete moves:

A UK consortium to progress ethics codes, skills frameworks, and information access practices.
A skills & competence framework defining what good assessors know and can do.
An AI Assurance Innovation Fund (£11m) to create/validate new testing methods; first call opens Spring 2026.
Alignment with international standards (notably ISO/IEC 42001, AI Management Systems), with UKAS developing corresponding accreditation schemes for certification bodies.

What it doesn’t (yet) do:

No binding audit mandate, triggers or scope thresholds.
No product certification regime for AI systems.
No standard evaluation suite or universal report template.

Analysis: can this approach actually deliver trustworthy audits?

Below, I go line-by-line through some practical questions. For each, what the roadmap already covers and what’s still needed to make audits bite.

A) What kinds of risks are we evaluating?

What the roadmap covers: It defines ‘AI assurance’ broadly (measuring, evaluating, communicating trustworthiness) and focuses policy effort on enabling conditions (skills, access, innovation) rather than prescribing a risk taxonomy.

What’s still needed: A shared risk catalogue with severity/likelihood anchors mapped to real incidents (e.g., jailbreak-ability, prompt injection, tool misuse/exfiltration, privacy leakage, unsafe autonomy, bias/harms, safety-policy non-compliance). A cross-walk is required to NIST AI RMF and EU AI Act obligations.

B) How do we evaluate those risks: evaluation suites, red-teaming, benchmarks, policy frameworks?

What the roadmap covers:

DSIT’s path to comparability is process-level certification of assurance methods and accreditation of providers, plus funding for method innovation. It explicitly notes existing process standards like performance testing, bias audit, risk assessment, and the UK-led AI cyber security standardisation track. It also flags assurer access (ranging from white-box to privacy-preserving setups) as a blocker government will work to solve.
The roadmap illustrates the basic information assurers should be able to access (e.g., system boundaries and intended use, inputs/outputs, algorithm/model and parameters, oversight/change-management mechanisms), but presents these as examples, not a mandated template.

What’s still needed:

A re-runnable evaluation harness (seeded corpora, prompt sets, tooling) so different assurers can reproduce results.
Named red-team protocols (internal + external) and a capability-area benchmark menu, with explicit ‘don’t Goodhart the scoreboard’ guidance.
Formal framework cross-walks (e.g., how an org’s ISO/IEC 42001 controls evidence maps to NIST/EU requirements).

C) Governance UI in deployed systems (transparency panes, audit trails, DAG lineage, access controls)

Why it matters: Minimum in-product transparency makes audits re-runnable and incidents trackable; assurers get the evidence they need without bespoke plumbing every time.

What the roadmap covers: No product-UI mandate. It does promise best-practice guidance on what assurers need to see (docs, logs, versioning, change control) which enables UI/telemetry patterns but doesn’t require them.

What’s still needed is Minimum Governance-UI expectations:

Provenance (per-claim citations where feasible),
Execution traces / DAG lineage (inputs → tools → outputs),
Version & policy pins,
Access controls (RBAC/ABAC/ReBAC) around sensitive actions,
Reviewer stamps / approvals / denials captured in the audit trail.

D) Evaluate both the system and the developer’s internal SOPs

What the roadmap covers: Strongly points this way via management system alignment i.e., not just what the model does, but whether the organisation runs a working governance system (policies, roles, change control, incident response).

What’s still needed: Make dual assurance explicit and packaged. System Evaluation (behaviour & capability risks, red-team/eval evidence) & Organisation Evaluation (SOPs tested for design & effectiveness) → one Assurance Report with scope, residual risks, mitigations, re-audit cadence, and an evidence bundle for reproducibility.

E) Standardising tests for comparability, policy-making, and reporting

What the roadmap covers: The people → process → provider scaffold is the right way to raise and compare quality across assurers.

What’s still needed:

A uniform report template: short public Assurance Summary + confidential Technical Annex (inputs, prompts, runs, logs, mitigations, residual risk, reproducibility notes).
A requirement for evidence bundles to enable independent re-runs.

F) Developer handover: a checklist of resources to furnish to third-party assurers

What the roadmap covers: The roadmap illustrates the basic information assurers should be able to access (e.g., system boundaries and intended use, inputs/outputs, algorithm/model and parameters, oversight/change-management mechanisms), but presents these as examples, not a mandated template. Government will map information requirements by assurance type and develop best-practice guidelines, with levers to encourage adoption.

What’s still needed (the checklist):

System card & intended-use limits; model/tool inventory; versioning policy.
Secure auditor access (sandboxed keys; privacy-preserving or enclave-style options where needed).
Evaluation seeds/prompts, test-harness config, red-team scenarios.
Run-logs & provenance (prompt → tool → output traces), change-control logs, release notes.
Incident history & mitigations; named rollback owner; contacts & escalation paths.

G) Who can do the audits?

What the roadmap covers: A pathway to professionalisation, process certification, and accredited firms (via UK quality infrastructure). In parallel, UKAS is standing up ISO/IEC 42001 accreditation for cert bodies, plumbing that lets buyers insist on accredited assurers.

What’s still needed:

A public register of accredited AI assurers with scope areas (safety, privacy, robustness, sector expertise).

H) What to publish, to whom (tiered transparency)

What the roadmap covers: Emphasizes reducing information asymmetries and increasing confidence, but doesn’t prescribe disclosure tiers.

What’s still needed (disclosure schema):

Clear distinction between what can be disclosed to the public, regulators & government bodies & research institutions.

Where the roadmap could fall short (if we stop here)

Without sector/capability thresholds (e.g., when autonomy or tool-use crosses X), laggards can free-ride while reputation sensitive firms shoulder the cost. The roadmap is a market-builder, not a mandate, useful, but uneven if left alone.
Comparability risk: Absent report templates and evidence bundles, audits remain incomparable, hard to use for procurement, regulation, or insurance. This is exactly the ‘quality’ problem DSIT is trying to cure.
Execution pace: The Innovation Fund only opens Spring 2026, while capabilities are racing now. Interim guidance on minimal evaluator access and report format would help.

The upgrade path to a market we can trust

Publish a risk catalogue & evaluation menu. Start with hazards that show up in incidents (privacy leakage, jailbreaks/prompt-injection, tool-abuse/exfiltration, unsafe autonomy, bias/harms), and tie each to test patterns and evidence.
Adopt a uniform Assurance Report template, enable independent re-runs by third parties, also unlocking policy and insurance use.
Anti gaming audit design: Make the system hard to game via surprise sampling, live red teaming & post market monitoring.
Include Developer remediation duty: specific timelines to align their systems to specified standards & regulation around actions to take if that is not done.
Auditor Rotation: to reduce bias & bring fresh eyes.

Closing

The UK’s play is to build market plumbing: professional people, certified methods, accredited providers, rather than to hard-mandate audits today. That’s a sensible first step, especially with ISO/IEC 42001 accreditation moving through UKAS and an Innovation Fund to push new tests. But to turn goodwill into governance, we now need comparable results, dual assurance of system and organisation, tiered transparency, and rotational checks. Do that, and the UK’s assurance market becomes something regulators, buyers, and the public can actually rely on.

Standard disclaimer

How I use AI in this piece: I used large language models to brainstorm, outline, and polish prose. I never accept AI generated facts without verification. All claims that matter are sourced inline, interpretation is mine.

Human responsibility: I decide structure, arguments, and conclusions. I edit every sentence. Any mistakes or oversights are on me, please send corrections.

Soumya Jain

Discussion about this post

Ready for more?