NIST AI RMF — Steven Picton

The NIST AI RMF is the US government's voluntary framework for managing AI risk responsibly — a structured playbook that helps any organisation build, deploy, or use AI systems they can genuinely stand behind.

Core idea

What problem does it solve?

AI risks aren't like traditional software risks. Models can drift, hallucinate, discriminate, or behave unpredictably in ways that are hard to detect and harder to explain. The RMF gives organisations a shared language and structured approach to get ahead of those risks — rather than reacting when something goes wrong.

Structure

Two parts, one goal

Part 1 — Foundations: Why AI risk is uniquely hard, who it applies to, and what "trustworthy AI" means across 7 characteristics.

Part 2 — The Core: Four functions (GOVERN, MAP, MEASURE, MANAGE) with categories and subcategories you can actually act on.

Key framing

What "risk" means here

Risk = probability × magnitude of harm. Harms can land on individuals, groups, organisations, society, or the environment. The framework asks you to think about both the likelihood something goes wrong and how bad it would be if it did.

Who it's for

AI actors across the lifecycle

The framework uses the term AI actor for anyone with a hand in an AI system: designers, developers, deployers, evaluators, procurement teams, end users, and even affected communities. Risk responsibility is shared — it doesn't sit with one team.

The heart of the framework. Four functions that work together in a continuous loop — not a one-time process. GOVERN wraps everything; MAP → MEASURE → MANAGE is the iterative cycle for each AI system.

Function 01

GOVERN

The culture layer. Sets policies, roles, accountability structures, and incentives so that risk management is embedded across the org — not siloed in one team. Cross-cutting: runs through all other functions.

Key outputs

AI risk policies and documentation
Clear roles and responsibilities
Org-wide risk culture and oversight
Processes for workforce training
Third-party / supply chain risk governance

In practice

When Microsoft rolled out Copilot to 300,000+ employees, GOVERN was their first challenge: who owned the AI policy? Which teams could approve new use cases? What happened when an employee used it in a way nobody had anticipated? They had to build accountability structures before the technology — because without them, every deployment decision defaulted to whoever happened to be in the room.

Function 02

MAP

The context layer. Before you can manage risk, you need to understand your system's purpose, who it affects, and what could go wrong. MAP is about identifying risks in the specific deployment context — not in the abstract.

Key outputs

Use-case and context documentation
Stakeholder impact analysis
Risk identification and categorisation
AI system scope and boundary definition
Understanding of AI actor roles

In practice

Amazon built a CV screening tool without doing MAP properly. They didn't fully map who the system would affect (female applicants, non-traditional career paths), what data it would learn from (a decade of historically male-dominated hiring decisions), or what "a good candidate" actually meant in different roles. The system learned to discriminate. A proper MAP exercise would have surfaced those contextual risks before a line of training code was written.

Function 03

MEASURE

The evidence layer. Analyse and test the AI system against the trustworthiness characteristics. This is where metrics, evaluations, and benchmarks come in — including the hard-to-quantify ones like fairness and explainability.

Key outputs

Risk measurement methodologies
TEVV: Test, Evaluation, Verification, Validation
Bias and fairness assessments
Performance metrics with disaggregated data
Documentation of residual risk

In practice

ProPublica's investigation into COMPAS — the criminal risk-scoring algorithm — was essentially a public MEASURE exercise that the company hadn't done itself. They measured false positive rates disaggregated by race and found Black defendants were nearly twice as likely to be incorrectly flagged as high risk. The system had been evaluated on overall accuracy. Nobody had measured it on the dimension that mattered most.

Function 04

MANAGE

The action layer. Based on what you've measured, allocate resources, treat identified risks, and build in ongoing monitoring. Includes the ability to stop or pause deployment when risk becomes unacceptable.

Key outputs

Risk treatment plans (mitigate, transfer, accept)
Ongoing monitoring and feedback loops
Incident response plans
Go / no-go deployment decisions
Residual risk documentation for users

In practice

In January 2025, Apple paused its AI news summarisation feature after it generated fabricated headlines — including a false report that the BBC's director-general had been arrested. MANAGE done well means having a go/no-go mechanism that can pull a system within hours. Apple had it. The cost was a news cycle. The alternative — no MANAGE process, no clear owner, no kill switch — is a legal and reputational disaster.

How they connect

The loop never stops

GOVERN wraps everything. MAP → MEASURE → MANAGE is the iterative loop for each AI system. You can enter at any point — but you should revisit continuously. Risk changes as systems, data, and contexts evolve. The framework is designed to be a living process, not a launch checklist.

For an AI system to be trustworthy, it needs to balance all seven of these characteristics — not just nail one or two. They trade off against each other, and that's where the genuinely hard decisions live.

Foundation

Valid & Reliable

The baseline requirement. The system does what it's supposed to do, consistently, across expected conditions. Without this, nothing else matters. Includes accuracy, robustness, and generalisability.

Critical

Safe

The system doesn't endanger life, health, property, or the environment. Safety must be baked in from the design stage — not patched in later. Highest priority when serious injury or death is possible.

Protection

Secure & Resilient

Security = defending against attacks. Resilience = recovering gracefully. Related but distinct: security includes resilience, but resilience also covers non-adversarial unexpected events.

Cross-cutting

Accountable & Transparent

Transparency = information about how the system works is available. Accountability = clear ownership of outcomes. Transparency is a prerequisite for accountability. Both require ongoing effort, not a one-time declaration.

Comprehension

Explainable & Interpretable

Explainability = how it works (mechanism). Interpretability = what the output means in context (significance). Transparency answers "what happened", explainability answers "how", interpretability answers "why it matters".

Rights

Privacy-Enhanced

Protecting human autonomy, identity, and dignity. Goes beyond data protection — includes anonymity, consent, and controlling how personal information is inferred. AI creates new privacy risks not present in traditional software.

Equity

Fair — with Harmful Bias Managed

Three categories of AI bias: systemic (embedded in datasets and society), computational/statistical (in algorithms and non-representative samples), and human-cognitive (in how people interpret outputs and make design decisions). Mitigating bias ≠ achieving fairness — they're related but distinct goals that require different interventions.

The tradeoffs are real — and the point

More interpretability can mean less accuracy. Better privacy can reduce fairness (data sparsity). Maximum security can reduce transparency. These tradeoffs don't have universal answers — they depend on context, values, and stakes. The framework asks you to make these tradeoffs deliberately and document your reasoning. That's the work.

The vocabulary you'll need to use fluently. Tap any term to expand it. These are the words that signal genuine depth of knowledge in a client conversation.

Core framework terms

AI Actor

Anyone with a hand in the AI lifecycle

Anyone who plays an active role in the AI system lifecycle — designers, developers, deployers, evaluators, operators, end users, and affected communities. Risk responsibility is distributed across all AI actors, not held by one team.

AI System

The formal definition

An engineered or machine-based system that generates outputs — predictions, recommendations, or decisions — influencing real or virtual environments. Designed to operate with varying degrees of autonomy.

Risk

Probability × magnitude

The composite measure of an event's probability of occurring and the magnitude of its consequences. Risk = likelihood of harm × magnitude of that harm. Can be positive or negative; the RMF focuses on managing negative impacts while enabling positive ones.

Risk Tolerance

You define it — the RMF doesn't

An organisation's readiness to accept risk in pursuit of its objectives. Highly contextual — shaped by regulation, industry norms, community expectations, and severity of potential harms. The RMF doesn't prescribe tolerance; you determine what's acceptable for your context.

Residual Risk

What remains after treatment

Risk that remains after risk treatment has been applied. Must be documented and communicated to end users so they understand potential negative impacts. Residual risk is unavoidable — the goal is to make it explicit and manageable.

TEVV

Test, Eval, Verify, Validate

Test, Evaluation, Verification & Validation. The ongoing technical process of checking that an AI system performs as intended, meets requirements, and doesn't create unexpected harms. Should happen throughout the AI lifecycle — not just at launch.

Socio-technical

AI is never just technical

A key framing throughout the RMF. AI systems are shaped by and embedded in social contexts, human behaviour, and organisational dynamics. Risk and trust can't be assessed by looking at the model alone — context and people are inseparable from the technology.

AI RMF Profile

Your customised application

A customised selection of functions, categories, and subcategories tailored to an organisation's specific context, risk tolerance, and goals. Profiles can represent current state ("where we are") or target state ("where we want to be").

Inscrutability

Opacity that blocks risk measurement

When an AI system is so opaque that it's difficult or impossible to understand why it produces certain outputs. Driven by model complexity, limited documentation, or inherent uncertainty. Makes risk measurement fundamentally harder — you can't properly assess what you can't see into.

AI RMF Playbook

The practical companion resource

A companion resource (available at nist.gov) that provides specific tactical actions for each category and subcategory of the framework. Voluntary and customisable. Think of the RMF as the "what" and the Playbook as the "how".

Trustworthiness terms

Validity

Does it do what it's supposed to?

Confirmation that requirements for a specific intended use have been fulfilled. A valid system does what it was designed to do in the context it's deployed in — not just in lab conditions.

Reliability

Does it keep doing it?

The ability to perform as required, without failure, over a given time period under expected conditions. Reliability is about consistent performance over the lifetime of the system — not just at launch.

Robustness

Does it hold up under pressure?

The ability to maintain performance across a variety of circumstances — including unexpected inputs and use cases. A robust system degrades safely when pushed outside its intended parameters rather than failing dangerously.

Explainability

How did it get there?

A representation of the mechanisms underlying an AI system's operation. Answers "how did it get to that output?" — the process and logic, not just the result. Distinct from interpretability, which is about meaning.

Interpretability

What does the output actually mean?

The meaning of an AI system's output in the context of its designed purpose. Answers "what does that output actually mean for this situation?" You need both explainability and interpretability for meaningful human oversight.

Systemic Bias

Baked into data and society

Bias embedded in datasets, organisational norms, and broader society — not necessarily from anyone's discriminatory intent. AI systems can learn and amplify existing systemic biases at scale and at speed.

Computational Bias

Errors in algorithms or data

Bias arising from systematic errors in algorithms or datasets — often from non-representative samples. Can produce systematically skewed outputs even when the underlying data appears "clean" on the surface.

Human-Cognitive Bias

How humans shape and read AI

Bias in how humans perceive and interpret AI outputs, or how human assumptions shape AI design decisions. Omnipresent throughout the lifecycle — from what features engineers prioritise to how operators act on AI recommendations.

Data Provenance

Where did the training data come from?

Tracking the origin and history of training data. Essential for transparency (understanding what the model learned from) and accountability (attributing decisions to data sources). Also relevant for copyright and intellectual property compliance.

PETs

Privacy-Enhancing Technologies

Technologies that reduce privacy risks — e.g., differential privacy, federated learning, data minimisation, de-identification. Can trade off with accuracy, particularly when training data is sparse. A key tool in privacy-enhanced AI design.

Ready-to-use language for the questions clients actually ask. Conversational, credible, and specific enough to demonstrate real depth.

When they ask: "What even is the NIST AI RMF?"

It's the most credible, widely-adopted framework for managing AI risk responsibly. Think of what ISO 27001 did for information security — it gave organisations a shared language and structured process to prove they're taking it seriously. The RMF does that for AI. It's voluntary today, but it's increasingly what regulators, enterprise clients, and boards are pointing to when they ask "how are you managing AI risk?"

When they ask: "Why can't we just use our existing risk management processes?"

Great question — and the RMF addresses this directly. AI risks are categorically different from traditional software risks. Models are trained on data that changes over time. They can behave differently in production than they did in testing. They can amplify existing biases at scale, in ways that are hard to detect and even harder to explain. Your existing frameworks weren't designed for that. The RMF fills those specific gaps.

When they ask: "What's the business case?"

Trustworthiness is a competitive advantage. Organisations that can demonstrate they govern AI responsibly — with documented processes, clear accountability, and measurable outcomes — will win procurement, avoid regulatory friction, and build the kind of public trust that's hard to earn back once it's lost. The RMF gives you the receipts to show clients and regulators that your AI isn't just powerful, it's responsible.

When they ask: "Which teams need to be involved?"

Everyone with a hand in the system — and that's deliberate. The framework uses the term "AI actors" because risk doesn't belong to just the data science team. Procurement, legal, operations, product, compliance, and even the communities affected by the system all have a role. The most common failure mode is treating AI risk as a purely technical problem. The RMF explicitly counters that.

When they ask: "How does this relate to the EU AI Act?"

The RMF was designed to be regulation-agnostic and sector-agnostic — but it maps well to regulatory requirements globally. Think of it as the foundational practice layer. If you're doing the RMF well, you're already doing the substantive work that most AI regulations require. It's the "responsible AI" muscle that makes compliance achievable rather than a last-minute scramble.

When they push back: "This seems like a lot of overhead for a small team"

The framework is explicitly designed to scale. Smaller organisations can select the categories and subcategories most relevant to their context — you don't implement all of it at once. And the cost of not doing it tends to show up later, at the worst possible time: when something goes wrong and you have no documented process to point to. Starting with GOVERN and a basic MAP assessment is a credible, proportionate starting point for any size organisation.

Key phrase to remember

The framing that lands

The RMF isn't about slowing down AI — it's about building AI you can defend. Every organisation deploying AI is going to face scrutiny at some point: from clients, regulators, the press, or their own board. The RMF is how you prepare for that scrutiny before it arrives.

Read a question, try to answer it in your head, then tap to reveal. The scenario questions at the bottom are the most useful — they mirror how you'll actually use this knowledge in a client conversation.

Recall — the basics

What are the four functions of the NIST AI RMF — and what is each one's role? ▾

GOVERN — the culture layer. Policies, roles, accountability, and incentives. Cross-cutting: it runs through all other functions.

MAP — the context layer. Understand the system's purpose, who it affects, and what could go wrong before measuring anything.

MEASURE — the evidence layer. Test and evaluate the system against trustworthiness characteristics using metrics and TEVV.

MANAGE — the action layer. Allocate resources, treat risks, monitor continuously, and make go/no-go decisions.

Which function is cross-cutting — and what does that mean in practice? ▾

GOVERN is cross-cutting. It doesn't happen once at the start — it runs through and informs all the other functions continuously. In practice: the policies and accountability structures set by GOVERN determine how MAP is conducted, what MEASURE looks for, and what MANAGE is empowered to do. Without GOVERN, the other three functions are just activities without authority or direction.

What does TEVV stand for — and at which function does it appear? ▾

Test, Evaluation, Verification, and Validation — and it appears in the MEASURE function. TEVV is the ongoing technical process of confirming that an AI system performs as intended, meets its requirements, and doesn't create unexpected harms. Critically: it should happen throughout the AI lifecycle, not just at launch. A system that passes TEVV at deployment may fail it six months later due to model drift or changing data.

What is "residual risk" and which function requires it to be documented? ▾

Residual risk is the risk that remains after all risk treatments have been applied. Every deployed system will have some — the goal is not to eliminate it but to make it explicit and acceptable. It appears in the MANAGE function, which requires residual risk to be documented and communicated to end users. The practical significance: if something goes wrong, documented residual risk is evidence of due diligence. Undocumented residual risk is a liability.

What does "AI actor" mean in the framework — and why does it matter? ▾

Anyone with an active role in the AI system lifecycle — designers, developers, deployers, evaluators, end users, and even affected communities. It matters because risk responsibility is distributed, not held by a single team. The developer who trains a model and the enterprise that deploys it in a new context are both AI actors with distinct responsibilities. Treating AI risk as someone else's problem — the data science team's, the vendor's — is the most common governance failure the framework is designed to prevent.

Deeper understanding

Why can't you just do MAP → MEASURE → MANAGE once and be done with it? ▾

Because AI systems change — and so do their contexts. The data distribution shifts. The use case expands beyond the original scope. New regulations come into force. Affected populations change. A system that was mapped, measured, and managed responsibly at launch may be creating new risks six months later that nobody has assessed. The framework is designed as a continuous loop, not a launch checklist. GOVERN exists to ensure that loop keeps running even when there's no incident forcing a review.

What's the difference between risk tolerance and risk appetite — and which does the AI RMF set? ▾

Risk appetite is the amount of risk an organisation is willing to accept in pursuit of its objectives — a strategic choice. Risk tolerance is the acceptable variation around that appetite — the operational guardrails. The AI RMF deliberately does not set either. It provides the framework for managing risk once you've defined your tolerance — but the values decision about how much risk is acceptable is left to organisations, industries, and regulators. This is intentional: what's acceptable in medical AI is fundamentally different from what's acceptable in a music recommendation engine.

Name three characteristics of trustworthy AI — and explain the one key tradeoff between them. ▾

The seven characteristics: Valid & Reliable, Safe, Secure & Resilient, Accountable & Transparent, Explainable & Interpretable, Privacy-Enhanced, Fair with Bias Managed.

The sharpest tradeoff: Explainability vs Accuracy. The most accurate AI models (deep neural networks) are the least explainable. Simpler, interpretable models sacrifice predictive accuracy. This isn't a technical problem to be solved — it's a values decision to be made: in a high-stakes context (medical diagnosis, loan decisions), you may need to sacrifice accuracy for explainability so humans can meaningfully oversee and challenge the system.

Scenario questions — the hardest and most useful

A bank wants to deploy an AI credit scoring system. Their data science team has built the model and it achieves 94% accuracy. The CRO says "we're ready to go live." Which function has almost certainly been skipped? ▾

The model is accurate overall. But has anyone asked: accurate for whom? What does 94% mean disaggregated by demographic group? Who conducted stakeholder impact analysis? Has anyone mapped which communities this system will affect, and what the consequences of errors are for those communities?

Almost certainly MAP — and possibly GOVERN. MAP requires understanding the deployment context, affected stakeholders, and potential harms before measuring. GOVERN requires knowing who is accountable when the system makes a wrong decision. A 94% accuracy headline answers neither question. The first thing to ask: "What does performance look like disaggregated by demographic group?" If they don't know, MAP hasn't been done.

A company's AI tool begins producing increasingly unreliable outputs six months after launch. The team that built it has moved on to other projects. Nobody knows whose job it is to investigate. Which GOVERN failure does this represent? ▾

The system is drifting. Performance is degrading. There's no accountability structure, no monitoring owner, no escalation path.

This is a classic GOVERN failure — specifically the absence of defined roles and responsibilities that persist beyond the launch of a system. GOVERN is supposed to establish ongoing accountability, not just launch-time approval. In practice: every deployed AI system needs a named owner responsible for its continued performance and risk management. "The team that built it" is not an owner — it's a project team that has moved on. MANAGE would also be implicated: there should be monitoring and incident response processes that would have caught the degradation.

A client says: "We ran the model through testing before we launched it, so we've done MEASURE." What's wrong with this? ▾

Pre-launch testing is a start. But MEASURE in the AI RMF is continuous, not a one-time gate.

Three problems. First, MEASURE is ongoing — it doesn't stop at launch. Post-deployment monitoring for drift, bias emergence, and performance degradation is part of MEASURE. Second, pre-launch testing in a controlled environment often misses risks that only emerge in real-world conditions — different user behaviour, data distribution shifts, edge cases that didn't appear in test sets. Third, MEASURE should cover all seven trustworthiness characteristics — not just accuracy. Did they measure fairness across demographic groups? Explainability? Resilience to adversarial inputs? If the answer is "we tested it for accuracy," that's partial MEASURE, not complete MEASURE.

A client asks: "We already follow ISO 27001 for cybersecurity — do we still need the AI RMF?" What do you say? ▾

ISO 27001 is excellent for what it does. But it was designed for traditional software and information systems — not for AI.

ISO 27001 covers confidentiality, integrity, and availability of information systems. The AI RMF covers a different and broader set of risks: model drift, hallucination, bias amplification, explainability, fairness, and the socio-technical risks that emerge from how AI interacts with people and society. These don't appear in ISO 27001. The RMF also integrates with ISO 27001 — it doesn't replace it. Think of it as the framework that fills the AI-specific gaps your existing risk management programme wasn't designed to handle. The organisations using both end up with genuinely integrated risk management rather than parallel frameworks that don't talk to each other.

The authoritative sources behind this guide. Primary documents come directly from NIST and the standards bodies — these are the texts regulators and auditors will reference.

Primary sources

NIST AI 100-1: Artificial Intelligence Risk Management Framework (AI RMF 1.0)

National Institute of Standards and Technology · January 2023

The primary source document for this entire guide. Free to download from NIST. The definitive reference for all four functions, trustworthiness characteristics, and terminology.

NIST SP 1270: Towards a Standard for Identifying and Managing Bias in Artificial Intelligence

National Institute of Standards and Technology · March 2022

The foundation for the three-category bias framework (systemic, computational, human-cognitive) referenced in the NIST AI RMF. Essential companion reading.

NIST AI RMF Playbook

National Institute of Standards and Technology · nist.gov

The companion tactical resource to the AI RMF — provides specific suggested actions for each category and subcategory. Freely available and continuously updated.

Standards & frameworks

ISO/IEC 42001:2023 — Artificial Intelligence Management Systems