← Portfolio Steven Picton · AI Knowledge
Strategy · Maturity

Responsible AI Maturity Models

MIT CISR · Microsoft RAI MM · CSIRO · RAI Institute 6 sections · Interactive self-assessment

A maturity model gives organisations — and the consultants who advise them — a shared language for describing where they are on the responsible AI journey, where they need to get to, and what the path looks like. It turns "we need to do better on AI governance" into a structured, measurable conversation.

What it is

A structured benchmark, not a judgement

A responsible AI maturity model describes an organisation's AI governance capability across five progressive levels — from ad hoc and reactive at Level 1, through to continuously optimised and industry-leading at Level 5. The levels are cumulative: each builds on the last. You can't skip levels — an organisation claiming Level 4 governance that hasn't implemented Level 2 controls is misrepresenting its maturity. Crucially, being at Level 1 or 2 isn't "bad" — it's an honest starting point that enables targeted improvement.

Why it matters

The governance gap is costing organisations real money

74% of companies haven't seen measurable return from their AI investments. A 2024 Gartner survey found that while 80% of large organisations claim to have AI governance initiatives, fewer than half can demonstrate measurable maturity. MIT CISR research across 721 companies found that organisations at Levels 1–2 of AI maturity perform below industry average financially, while those at Levels 3–4 perform well above average. Maturity is directly correlated with financial performance — not just compliance.

McKinsey reports organisations embedding responsible AI governance see up to 40% higher ROI from AI investments due to reduced rework and audit costs.

The key insight

Maturity ≠ policy coverage. Maturity = evidence quality.

The most common mistake organisations make is confusing having a policy with having mature governance. A responsible AI policy published on a website is not evidence of maturity. Evidence of maturity is: model cards for deployed systems, completed impact assessments with documented decisions, audit logs that regulators could review, bias remediation records with timelines, and incident response playbooks that have been tested. Governance maturity is measured by what you can prove, not what you've written.

Five dimensions assessed

What the model measures

Leading maturity models assess five dimensions simultaneously. Policy — what is documented and approved. Lifecycle controls — what governance gates exist at build, test, deploy, and change stages. Data and lineage — provenance, quality standards, and traceability. Documentation — model cards, impact assessments, audit logs. Monitoring — drift detection, incident response, and continuous improvement. An organisation can be at different maturity levels across these dimensions — and usually is.

Five levels, each with distinct characteristics in how AI governance is practised, evidenced, and embedded. Tap any level to expand it. The question is not which level sounds aspirational — it's which level honestly describes your current state.

Level 1 Initial — Ad Hoc Governance by accident

Governance is informal, inconsistent, or nonexistent. AI systems are deployed based on technical capability rather than governance readiness. Risk management depends on individual champions — if that person leaves, the control goes with them. There is no AI inventory, risk classification is inconsistent across teams, and documentation is sparse. Most organisations deploying AI for the first time are here.

What you typically see
  • No central AI inventory or register
  • Teams deploying AI without governance review
  • No AI risk appetite statement
  • Incident response is reactive and unplanned
  • Documentation is project-specific, if it exists
  • Compliance depends on individuals, not processes
First priority moves
  • Build an AI system inventory — know what you have
  • Appoint an AI governance owner (even part-time)
  • Draft an AI use policy — even a simple one
  • Identify your highest-risk AI systems
  • Establish basic incident reporting channels
Level 2 Repeatable — Emerging Governance in silos

Basic governance processes exist but are inconsistently applied. Some teams follow AI lifecycle controls while others bypass them. Risk management is starting to formalise but still depends on individual champions rather than institutional processes. Incident response protocols may exist in draft form but haven't been tested. This is where many organisations that have "done something" on AI governance actually sit.

What you typically see
  • AI ethics principles or policy published
  • Some teams following governance; others bypassing it
  • Governance reviews happening for major projects only
  • Risk assessment is manual and inconsistent
  • No standard model documentation format
  • "Marketing does AI their way; Engineering does it theirs"
Priority moves to reach Level 3
  • Form a cross-functional AI Governance Council
  • Publish a unified AI Policy with mandatory scope
  • Mandate standard tooling (MLOps) for all teams
  • Create a defined model approval process
  • Classify all AI systems by risk tier
Level 3 Defined — Enterprise The governance tipping point

This is the tipping point. The organisation establishes a unified governance framework applied consistently across the enterprise. An AI Governance Council exists with cross-functional representation. A central AI Policy is published. Standard tooling is mandated. The organisation has visibility into its AI inventory and a defined process for approving models before deployment. MIT CISR research identifies this stage as where financial performance shifts from below-average to above-average.

What you typically see
  • AI Governance Council with real authority
  • Complete AI system inventory with risk classifications
  • Standard model documentation (model cards) in use
  • Impact assessments conducted for high-risk systems
  • Bias testing included in deployment gates
  • "We have a standard playbook for responsible AI"
Priority moves to reach Level 4
  • Move from manual to automated monitoring
  • Build governance dashboards for executive visibility
  • Formalise cross-functional reviews pre-deployment
  • Tie governance metrics to executive reporting
  • Implement data lineage capture and model registry
Level 4 Managed — Systemic Governance as a performance driver

Governance is no longer a gate — it's embedded into how AI is built and operated. Monitoring is automated. Metrics are quantified and reported at board level. The organisation can demonstrate regulatory compliance with evidence, not just policy documents. Incident response has been tested. AI governance has moved from a cost centre to a capability that accelerates responsible AI deployment. MIT CISR identifies this stage as where AI becomes embedded across the enterprise and aligned with strategy.

What you typically see
  • Automated drift detection and performance alerts
  • Governance dashboards tracking coverage and risk
  • Board receives regular AI metrics and incident reports
  • Incident response playbooks tested and refined
  • Dynamic risk scoring based on live telemetry
  • Governance enables faster, not slower, AI deployment
Priority moves to reach Level 5
  • Implement post-incident learning loops updating policy
  • Expand red-teaming and adversarial testing for GenAI
  • Pursue voluntary audits and external certification
  • Publish transparency reports for stakeholders
  • Contribute to industry governance standards
Level 5 Optimising — Transformative Industry-leading, continuously improving

RAI practices are statistically measured, evaluated, monitored, and consistently applied across the entire organisation. The organisation leads the industry — conducting voluntary audits, publishing transparency reports, contributing to external standards, and operating federated governance across subsidiaries and supply chains. The culture of responsible AI is self-sustaining: employees surface concerns proactively, and governance is seen as a professional standard, not a compliance burden. Very few organisations genuinely sit here.

What you typically see
  • Voluntary external audits and certifications (e.g. ISO 42001)
  • Published AI transparency and impact reports
  • Federated governance across subsidiaries and supply chain
  • Continuous improvement driven by real-world telemetry
  • Active contribution to industry standards and regulation
  • Governance embedded in organisational culture, not just process
Characteristics of Level 5 culture
  • Transparency is celebrated, not just required
  • Documentation is second nature across all teams
  • Employees escalate AI concerns proactively
  • Responsible AI is a recruitment and retention advantage
  • Governance quality is part of brand identity
The honest truth

Most large enterprises are at Level 2–3

Despite widespread claims of "mature AI governance," the reality is that most large enterprises sit between Level 2 and Level 3. They have policies and some processes — but governance is inconsistently applied, evidence is patchy, and monitoring is largely manual. The gap between stated maturity and actual maturity is one of the most consistent findings across responsible AI benchmark surveys. Honest assessment of current state is the precondition for meaningful improvement.

A quick diagnostic. For each area, select the level that honestly describes your current practice — not your aspiration or your policy, but what is actually happening today. Your score indicates your overall maturity level.

AI Governance Self-Assessment — select current practice for each area
Governance area
1Ad hoc
2Silo
3Defined
4Managed
5Leading
AI system inventory and risk classification
AI policy — documented, approved, and enforced
Model documentation (model cards, technical docs)
Impact assessments for high-risk AI systems
Bias testing and fairness evaluation at deployment
Post-deployment monitoring and drift detection
Incident response — documented, tested, and rehearsed
Board-level AI metrics and reporting
Maturity score
Answer above
Using this tool with clients

The assessment is a conversation starter, not a verdict

Use this diagnostic at the start of an engagement to create a shared, honest picture of current state. The value is not in the score — it's in the conversation that happens when different stakeholders disagree about which level describes their organisation. Those disagreements are the governance gaps. An organisation where the CISO thinks they're at Level 4 and the data science team thinks they're at Level 2 has discovered something important before the assessment is even complete.

Moving from one level to the next requires deliberate investment in the right areas. The roadmap is not a checklist — it's a prioritised set of capabilities to build, anchored in what the evidence shows makes the biggest difference.

What to focus on at each transition
Level 1 → Level 2

From invisible to visible

  • Build your AI system inventory — every deployed model, its purpose, its data sources
  • Classify systems by risk (use EU AI Act tiers as a starting framework)
  • Appoint a named AI governance owner with clear mandate
  • Draft an AI use policy — even a page is better than nothing
  • Establish a basic incident reporting channel
  • Start requiring impact assessments for new high-risk deployments
Level 2 → Level 3

From silo to enterprise

  • Form a cross-functional AI Governance Council (Legal, Risk, Tech, Ethics, Operations)
  • Publish a unified AI Policy that applies to all teams — with teeth
  • Mandate standard model documentation (model cards) across all deployments
  • Create a defined model approval process with clear go/no-go criteria
  • Standardise bias testing as a deployment gate
  • Build a governance dashboard visible to senior leadership
Level 3 → Level 4

From defined to measured

  • Automate drift detection and performance monitoring
  • Tie AI governance metrics to executive and board reporting
  • Test and refine incident response playbooks through tabletop exercises
  • Implement dynamic risk scoring that updates based on live data
  • Build post-incident learning loops that update policy automatically
  • Integrate governance evidence into regulatory filings and procurement responses
Governance metrics worth tracking
AI Inventory Coverage
% of deployed AI systems that are documented and risk-classified
Target: 100% for production systems
Impact Assessment Rate
% of high-risk AI deployments with a completed, current impact assessment
Target: 100% for Annex III systems
Bias Remediation Time
Average time from bias detection to documented remediation
Level 3: days. Level 4: hours.
Governance Throughput
Number of AI deployment decisions processed through the governance council per month
Bottleneck = governance immaturity signal
Explainability Ratio
% of AI decisions that can be traced to documented lineage and rationale
Target varies by risk tier
Incident Rate & Trend
Number of AI-related incidents per period and year-on-year trend
Increasing trend = governance gap
Policy Compliance Rate
% of AI projects following mandated governance processes end-to-end
Below 80% = Level 2 at best
Data Integrity Index
% of deployed models using certified, quality-assured datasets
Foundation for all other governance metrics
The sequencing trap

Don't try to move two levels at once

The most common roadmap failure is setting a Level 4 target before Level 3 is solid. Automated monitoring built on top of inconsistent manual processes doesn't produce reliable governance — it produces the appearance of governance. Each level must be genuinely embedded before the next is attempted. The signal that a level is genuinely achieved: the governance controls work without the champion who built them. If your AI governance depends on one or two key people, you are at most at Level 2.

The vocabulary of AI maturity — the terms that turn "we need to improve our AI governance" into a structured, credible conversation.

Responsible AI Maturity Model (RAI MM)
The framework itself
A structured framework describing an organisation's AI governance capability across progressive levels. Microsoft Research published a RAI MM with 24 empirically derived dimensions across five levels (Latent to Leading). The RAI Institute's model covers five stages (Initial/Latent to Transformative) based on seven years of research. Multiple models exist — the common thread is five levels from ad hoc to optimised.
Governance Debt
The cost of deferred governance
The accumulation of governance shortcuts, undefined processes, and unmanaged risks that builds up when technology adoption outpaces governance development. Like technical debt, governance debt eventually comes due — manifesting as regulatory fines, reputational damage from AI failures, or inability to scale successful pilots because compliance processes are too slow or fragmented.
Model Card
Standard model documentation
A structured document accompanying an AI model that describes its intended use cases, performance metrics (including disaggregated results for different population groups), known limitations, ethical considerations, and training data characteristics. Model cards are a key evidence artefact distinguishing Level 2 from Level 3 maturity. First proposed by Google researchers in 2018, now expected by most regulatory frameworks.
AI Governance Council
Cross-functional oversight body
A cross-functional body with representation from Legal, Risk, Ethics, Technology, and Operations that has formal authority over AI governance decisions — including model approval, policy setting, and incident response. Distinguished from an AI Ethics Committee (advisory) by having actual decision-making authority and accountability. A defining characteristic of Level 3 maturity.
Governance Throughput
A maturity bottleneck signal
The number of AI governance decisions (model approvals, risk assessments, policy reviews) a governance structure can process per unit of time. At Level 2–3, governance throughput becomes a binding constraint: a single committee reviewing every deployment works for five models. At 50 models, it creates a bottleneck that pressure-tests the governance culture. Mature organisations solve this through tiered review processes, not by bypassing governance.
ISO/IEC 42001
AI management system standard
The international standard for AI management systems — published 2023. Provides requirements for establishing, implementing, maintaining, and continually improving an AI management system. Complementary to ISO/IEC 42005 (impact assessment) and ISO/IEC 23894 (risk management). Certification against ISO 42001 is the most credible external validation of Level 4–5 governance maturity currently available.
MLOps
Operational infrastructure for AI governance
Machine Learning Operations — the set of practices and tools for automating the deployment, monitoring, and management of machine learning models in production. Mandating standard MLOps platforms across an organisation is a key Level 2→3 transition move. Without MLOps, governance controls are manual and inconsistent; with it, they can be embedded in the deployment pipeline itself.
Federated Governance
Level 5 governance architecture
A governance model where AI oversight responsibilities are distributed across subsidiaries, business units, or supply chain partners — each with local accountability — connected by shared standards, interoperable monitoring systems, and central oversight. The hallmark of Level 5 maturity in large, complex organisations. Federated governance enables scale without central bottlenecks; it fails without strong shared standards and trust.
Red-teaming (for AI)
Adversarial testing of AI systems
Structured adversarial testing of an AI system by a team tasked with finding failure modes, vulnerabilities, and misuse scenarios — analogous to penetration testing in cybersecurity. The EU AI Act requires adversarial testing for GPAI models posing systemic risk. A Level 4+ capability: most organisations at Level 3 conduct bias and performance testing but not structured red-teaming.
Total AI Effectiveness (MIT CISR)
The performance measurement behind the model
MIT CISR's composite measure used to classify AI maturity, combining three equally weighted factors: effectiveness of AI to improve operations, improve customer experience, and support and develop ecosystems. Score of 0–49% = Stage 1; 50–74% = Stage 2; 75–99% = Stage 3; 100% = Stage 4. Stages 1–2 correlate with below-average financial performance; Stages 3–4 with significantly above-average financial performance.

How to use maturity models in client conversations — framing that moves clients from "interesting concept" to "I need to understand where we sit."

When they ask: "Where do most organisations sit on this model?"
The honest answer: most large enterprises are at Level 2 moving toward Level 3. They have policies and some processes — but governance is inconsistently applied across business units, documentation is patchy, and monitoring is largely manual and reactive. Gartner found that while 80% of large organisations claim AI governance initiatives, fewer than half can demonstrate measurable maturity. The gap between stated maturity and actual maturity is one of the most consistent findings in responsible AI research. Level 1 is also more common than people admit — organisations that have deployed AI without any structured governance process, relying on individual judgement and hoping nothing goes wrong.
When they ask: "What does moving from Level 2 to Level 3 actually involve?"
Three things above everything else. First: a cross-functional AI Governance Council with real authority — not an advisory body, but one that can say no to a deployment. Second: a unified AI policy that applies to all teams, with mandated tooling and a defined approval process. Without this, different parts of the organisation will keep making incompatible governance decisions. Third: standard model documentation — model cards — for every deployed system. That documentation is the evidence base that regulators, auditors, and clients will ask to see. The Level 2 to 3 transition is fundamentally about moving from individual judgement to institutional process. That's a cultural change as much as a technical one, and it typically takes 12–18 months to embed genuinely.
When they ask: "Why does maturity level affect financial performance?"
MIT CISR's research across 721 companies found a clear correlation: organisations at Levels 1–2 of AI maturity perform below industry average financially; those at Levels 3–4 perform well above average. The mechanism is fairly intuitive. At low maturity, AI projects fail more often, take longer to debug, get blocked by regulatory concerns, and generate incidents that consume management time and create liability. At high maturity, deployment is faster because the evidence trail is clean, incidents are caught earlier and resolved faster, and regulatory approval is predictable. McKinsey quantifies this: organisations with embedded responsible AI governance see up to 40% higher ROI from AI investments. Governance isn't a cost — it's the operating system for extracting value reliably.
When they push back: "We're at Level 4 already — we have all of this in place."
The most revealing follow-up question is: "Can you show me the evidence?" Level 4 maturity means automated monitoring with documented thresholds, incident response playbooks that have been tested and refined through actual use, board-level AI metrics received regularly, and governance controls that function when the people who built them aren't in the room. If any of those elements are missing or hypothetical, the honest assessment is Level 2–3. Organisations consistently overestimate their maturity because they conflate policy creation with policy enforcement. A maturity model is only useful if it reflects reality — not aspiration.
When they ask: "How do we use this as a consulting entry point?"
The self-assessment is your entry point. Run it with a cross-functional group — CISO, General Counsel, Chief Data Officer, and a business unit AI lead. The score matters less than what happens when they disagree. When the CISO thinks the organisation is at Level 3 on incident response and the data science team thinks it's Level 1, you've identified a governance gap that has real operational and regulatory consequences. Your value as an advisor is in facilitating that honest conversation, translating the gap into business risk terms, and building a credible roadmap from current state to target state. The maturity model is the diagnostic; the roadmap is the engagement.

The framing that opens doors

Maturity is the bridge between frameworks and reality

NIST RMF tells you what good governance looks like. The EU AI Act tells you what's legally required. A maturity model tells you where you are relative to both — and what the next step looks like in your specific context. It's the tool that turns abstract governance frameworks into a concrete, prioritised action plan. That's why it belongs in every enterprise AI advisory conversation.

Maturity models are most useful as a diagnostic tool in conversation — these questions help you use them that way.

Name the five maturity levels and one defining characteristic of each.
Level 1 — Initial (Ad Hoc): No AI inventory. Governance depends on individuals, not processes. Deployment decisions based on technical capability alone.

Level 2 — Repeatable (Emerging): Basic governance exists but applied inconsistently across teams. "Marketing does AI their way; Engineering does it theirs."

Level 3 — Defined (Enterprise): Unified governance framework applied across the org. AI Governance Council with real authority. Standard model documentation in use. The financial performance tipping point.

Level 4 — Managed (Systemic): Automated monitoring. Board receives regular AI metrics. Governance accelerates — rather than slows — responsible AI deployment.

Level 5 — Optimising (Transformative): Voluntary external audits. Published transparency reports. Federated governance across subsidiaries. Culture of responsible AI is self-sustaining.
What does MIT CISR research show about the relationship between AI maturity and financial performance?
Based on a survey of 721 companies: organisations at Levels 1–2 perform below industry average financially. Organisations at Levels 3–4 perform well above industry average. The mechanism: low maturity means AI projects fail more often, take longer to debug, get blocked by regulatory concerns, and generate incidents that create liability. High maturity means faster deployment, cleaner evidence trails for regulators, and incidents caught and resolved before they escalate. McKinsey quantifies this as up to 40% higher ROI from AI investments for organisations with embedded responsible AI governance.
What is "governance debt" — and what does it look like in practice?
The accumulation of governance shortcuts, undefined processes, and unmanaged risks that builds up when technology adoption outpaces governance development. Like technical debt, it eventually comes due — but the cost is regulatory fines, reputational damage, or the inability to scale AI because compliance processes are too slow or fragmented.

In practice: a company that has deployed 50 AI models under ad-hoc governance has accumulated 50 instances of undocumented risk, undefined accountability, and untested incident response. When the EU AI Act compliance deadline arrives, remediating 50 systems is significantly more expensive and disruptive than having governed them properly from the start.
A client's CISO says they are at Level 4 maturity. Their data science lead says Level 2. Who do you believe — and how do you find out?
Different parts of the organisation have different visibility into AI governance — and different incentives to report positively.
Believe the lower number until proven otherwise — organisations consistently overestimate maturity by confusing policy creation with enforcement. The diagnostic question: "Can you show me the evidence?" Level 4 means: automated drift monitoring with documented thresholds, incident response playbooks that have been tested (not just written), board-level AI metrics received in the last quarter, and governance controls that function when the people who built them are not in the room. Ask for each of those specifically. If the evidence isn't there, the level isn't there. The disagreement between CISO and data science lead is itself a signal — it suggests governance visibility is inconsistent, which is a Level 2 characteristic.
A company wants to jump from Level 2 directly to Level 4. What do you tell them?
Level 4 automated monitoring built on top of Level 2 inconsistent processes doesn't produce mature governance — it produces the appearance of governance.
You can't skip levels — each requires the foundations of the previous one to be genuinely embedded. Level 4 automated monitoring only works if Level 3 unified policies and standard tooling are already in place across the whole organisation. If they're not, you're automating chaos. The signal that a level is genuinely achieved: the governance controls work without the champion who built them. The fastest path to Level 4 is building Level 3 properly — which typically takes 12–18 months — not trying to shortcut to Level 4 with technology investments that sit on top of an inconsistent foundation.

The research, models, and frameworks behind this guide — drawn from MIT, Microsoft, CSIRO, BCG, and the RAI Institute, representing the leading thinking on responsible AI maturity.

Primary research & models
Standards
Further reading