← Portfolio Steven Picton · AI Knowledge
Concepts · NIST AI RMF

AI Trustworthiness

Seven characteristics · Tradeoffs · Real-world application 6 sections · Interactive

Trustworthiness is not a single property an AI system either has or lacks — it is a balance of seven characteristics, each in tension with the others. Understanding these tensions is where genuine expertise lives, and where the most consequential decisions about AI deployment get made.

The foundation

Trustworthiness is a social concept, not a technical one

NIST is explicit: trustworthiness is a social concept that ranges across a spectrum and is only as strong as its weakest characteristic. A system cannot be called trustworthy because it scores well on one dimension — a highly accurate but opaque system, a secure but unfair system, or a transparent but unreliable system are all untrustworthy. The goal is balance, not optimisation of individual properties.

Why it matters beyond compliance

40% of business leaders cite trustworthiness concerns — and consumers notice

A 2023 survey found more than 40% of business leaders cite concerns about AI trustworthiness as a barrier to adoption. A 2024 study found that simply labelling a product as using "artificial intelligence" reduces purchase intent among consumers — a direct measure of the trust deficit AI faces.

The business case for trustworthy AI is not just ethical — it is commercial. Organisations that can demonstrate trustworthiness across all seven characteristics can deploy AI more broadly, win procurement with trust-sensitive clients, and avoid the reputational and regulatory consequences of high-profile AI failures.

The architecture

How the seven characteristics relate to each other

Valid & Reliable is the foundation — without it, no other characteristic matters. You cannot have a trustworthy system that doesn't work.

Accountable & Transparent is the cross-cutting characteristic — it touches all the others. You cannot manage what you cannot see, and you cannot be accountable for what you have not disclosed.

The remaining five — Safe, Secure & Resilient, Explainable & Interpretable, Privacy-Enhanced, and Fair with Bias Managed — must all be balanced against each other and against the first two. Neglecting any one increases both the probability and magnitude of harm.

The hard truth

Every AI deployment involves tradeoffs between these characteristics

This is not a limitation to be solved — it is a reality to be managed. More interpretability typically costs some accuracy. Better privacy can reduce fairness. Maximum security can reduce transparency. These tradeoffs are inherent, not incidental. The mark of sophisticated AI governance is making these tradeoffs deliberately, transparently, and with appropriate justification — not pretending they don't exist.

Seven characteristics, each with distinct meaning, measurement approaches, and tensions with the others. Tap any card to expand it. The NIST AI RMF treats all seven as necessary conditions — none is optional, though their relative priority depends on context.

Foundation
Valid & Reliable
Does the system do what it is supposed to do, consistently, in the conditions it is expected to operate in? Without this, nothing else matters.

Validity is confirmation that requirements for the specific intended use have been fulfilled — the system does what it was designed to do in its deployment context, not just in the lab. Reliability is the ability to perform without failure over time and across conditions — not just at launch, but throughout the system's operational lifetime.

Two sub-properties matter most: accuracy (how close outputs are to true values, measured with false positive and false negative rates disaggregated by demographic group) and robustness (the ability to maintain performance across varied, unexpected, and adversarial conditions). A system can be accurate in testing and non-robust in deployment — and that gap is where most real-world AI failures live.

Board & client questions this enables
  • What are the false positive and false negative rates — disaggregated by demographic group?
  • How does performance differ between training conditions and real-world deployment?
  • What is the monitoring process for detecting when performance degrades over time?
  • Has the system been tested in conditions outside its training distribution?
Critical
Safe
Does the system avoid endangering life, health, property, or the environment — under all defined operating conditions?

Safety must be designed in from the start — it cannot be retrofitted. The highest priority when risk of serious injury or death exists. Safe operation requires responsible design and development, clear guidance to deployers on responsible use, and the ability to shut down, modify, or intervene when systems deviate from expected behaviour.

AI safety draws on established practices from high-stakes industries — aviation, medical devices, nuclear — where safety cases must be constructed and maintained throughout the system lifecycle. For AI, this means rigorous simulation testing, real-time monitoring, human override capabilities, and defined safe failure modes.

What "safe" looks like in practice
  • Defined safe failure modes — the system degrades gracefully rather than failing dangerously
  • Human override mechanisms that are genuinely usable under operational conditions
  • Real-time monitoring with automatic alerts when outputs fall outside safe parameters
  • Documented safety cases reviewed by independent parties for high-stakes deployments
Protection
Secure & Resilient
Can the system withstand attacks and unexpected adverse events, and recover when they occur?

Security and resilience are related but distinct. Security encompasses protocols to avoid, protect against, respond to, and recover from attacks — including adversarial examples, data poisoning, model inversion, and intellectual property theft through system endpoints. Resilience is the ability to return to normal function after unexpected adverse events — including non-adversarial failures, environmental changes, and out-of-distribution inputs.

AI systems face unique security threats not present in traditional software. Data poisoning can corrupt a model's behaviour at the training stage. Adversarial examples — carefully crafted inputs — can cause confident but wrong predictions. Model theft through repeated querying can expose proprietary intellectual property.

Key security considerations for AI
  • Adversarial robustness testing — can the system be fooled by crafted inputs?
  • Training data integrity — has the training pipeline been protected from poisoning?
  • Model access controls — who can query the model and what rate limits apply?
  • Graceful degradation — what happens when the system operates outside its training distribution?
Cross-cutting
Accountable & Transparent
Is information about the system available to those who need it — and are there people and processes responsible for its outcomes?

Transparency is the extent to which information about an AI system and its outputs is available to those interacting with it — whether or not they know they are interacting with AI. It answers "what happened?" Accountability presupposes transparency: you cannot be accountable for something that has not been disclosed. It answers "who is responsible?"

Meaningful transparency is layered — different stakeholders need different information. A technical team needs model architecture and training data details. A regulator needs documentation of the risk management process. An end user needs to know they are interacting with AI and what their recourse is if they believe a decision was wrong. A system can be transparent to one audience and opaque to another — and both matter.

NIST treats this characteristic as cross-cutting because accountability and transparency are necessary to assess and maintain all other characteristics over time. Without transparency, you cannot determine whether a system is fair, safe, or reliable.

The three transparency questions
  • What happened? — System-level transparency: what did the system do, when, and under what conditions?
  • How? — Explainability: what mechanism produced this particular output?
  • Why does it matter? — Interpretability: what does this output mean in the context of the decision being made?
Comprehension
Explainable & Interpretable
Can humans understand how and why the system produced a particular output — well enough to oversee, challenge, and improve it?

Explainability describes the mechanisms underlying AI operation — the "how." Interpretability describes the meaning of outputs in context — the "why it matters." Both are necessary for genuine human understanding, but they are different things. A technically explainable system may produce outputs that are meaningless without domain expertise. An interpretable output may be based on mechanisms that cannot be adequately explained.

The tension with validity is real and documented: the most accurate models (deep neural networks) are often the least explainable. Simpler, more interpretable models (decision trees, linear regression) sacrifice predictive power for transparency. This is not a problem to be solved — it is a tradeoff to be made deliberately based on the stakes of the application.

Explainability approaches
  • LIME / SHAP — local post-hoc explanation methods for individual predictions
  • Attention mechanisms — showing which input features the model weighted most heavily
  • Counterfactual explanations — "the decision would have been different if X had been Y"
  • Inherently interpretable models — choosing simpler architectures where stakes are high
Rights
Privacy-Enhanced
Does the system protect human autonomy, identity, and dignity — including from inferences the individual never consented to?

Privacy in AI goes beyond data protection. It encompasses anonymity, confidentiality, and individual control over how personal information is used and inferred. AI creates new privacy risks not present in traditional software: systems can infer sensitive attributes (health conditions, sexual orientation, political views) from seemingly innocuous inputs — browsing history, purchase patterns, social connections — without any individual disclosing that information directly.

Privacy-Enhancing Technologies (PETs) — differential privacy, federated learning, homomorphic encryption, data minimisation — are the technical tools. But privacy by design — embedding privacy considerations into system architecture from the start — is the governance approach. Retrofitting privacy onto a deployed system is significantly harder and less effective.

The privacy-fairness tension

Privacy-enhancing techniques can reduce fairness. Differential privacy adds noise to model outputs to prevent individual identification — but that noise can disproportionately affect predictions for minority groups who are already underrepresented in training data. This is one of the most common and least understood tradeoffs in responsible AI.

Equity
Fair — with Harmful Bias Managed
Does the system treat people equitably — and are the three categories of AI bias actively managed across the entire lifecycle?

Fairness and bias management are related but not identical. A system with managed bias is not necessarily fair — systems in which predictions are somewhat balanced across demographic groups may still be inaccessible to people with disabilities, may exacerbate existing disparities, or may apply equal treatment to historically unequal groups in ways that perpetuate inequality.

NIST identifies three bias categories that must all be managed: systemic (embedded in data and society), computational/statistical (in algorithms and processing), and human-cognitive (in how people design and interpret AI). All three can arise in the absence of discriminatory intent — which is why bias management requires structural processes, not just good intentions.

Standards of fairness are complex and culturally variable. Perceptions of fairness differ across cultures and shift depending on application — what is considered fair in employment may be contested in criminal justice. This context-dependence is not an excuse to avoid the question, but a reason to engage stakeholders in defining fairness criteria for each specific use case.

Questions to ask about any AI system
  • What fairness metric is being used — and is it the right one for this context and these stakes?
  • Are performance metrics disaggregated by demographic group, or reported only in aggregate?
  • Has the system been assessed for all three categories of bias — not just the data?
  • Have affected communities been involved in defining what fairness means in this context?

The tradeoffs between trustworthiness characteristics are real, documented, and in many cases mathematically provable. This is where the genuinely hard decisions in AI governance live — and where the most valuable advisory conversations happen.

The core principle

Tradeoffs must be made deliberately — not by default

Every AI deployment involves tradeoffs between the seven characteristics. The mark of mature governance is not avoiding these tradeoffs — it is making them explicitly, with appropriate justification, documented transparently, and reviewed regularly. When tradeoffs are made implicitly (by optimising for accuracy without considering fairness, or by prioritising interpretability without assessing the accuracy cost), the result is governance by accident. NIST is explicit: tradeoffs depend on "the values at play in the relevant context and should be resolved in a manner that is both transparent and appropriately justifiable."

Key tradeoff pairs
Tradeoff The tension How to navigate it
Accuracy vs Explainability The most accurate models (deep neural networks, large language models) are the least explainable. Simpler, interpretable models sacrifice predictive power. High tension. Ask what the cost of a wrong decision is. For high-stakes decisions affecting rights and safety, interpretability may warrant accuracy sacrifice. For lower-stakes recommendations, accuracy may dominate. The choice must be justified.
Privacy vs Fairness Privacy-enhancing techniques add noise to data and outputs. That noise disproportionately degrades prediction quality for underrepresented groups — often the groups most in need of fair outcomes. High tension. Assess which groups are most affected by the privacy technique and whether the impact is equitable. Consider whether privacy protections can be applied in ways that distribute the accuracy cost more evenly across groups.
Transparency vs Security Full transparency about model architecture and training data enables adversarial attacks — attackers who know the system can craft inputs to exploit it. Medium tension. Apply layered disclosure: full technical documentation to authorised auditors, operational outputs to affected individuals, high-level methodology to the public. Transparency does not mean publishing everything — it means appropriate disclosure to appropriate audiences.
Safety vs Autonomy Systems designed to maximise safety often restrict the range of outputs and override human autonomy. Systems designed to support human autonomy must tolerate some unsafe outcomes. Medium tension. Define the failure modes and their consequences explicitly. Safety constraints should be calibrated to the actual risk — not applied maximally regardless of context. Human override should be preserved for high-stakes decisions even when it introduces risk.
Fairness vs Accuracy Optimising for overall accuracy ignores performance disparities across groups. Adding fairness constraints typically reduces overall accuracy. Both cannot be simultaneously maximised. High tension. Choose the fairness metric appropriate to the context and stakes. Document the accuracy cost explicitly and justify the choice. Disaggregate performance metrics to understand who bears the cost of accuracy reductions.
Reliability vs Adaptability Systems that adapt continuously (online learning) may be less reliable — their behaviour can change in unpredictable ways. Systems that are frozen post-training are more predictable but may drift from the real world over time. Medium tension. Define update policies explicitly: when will the model be retrained, under what conditions, and what testing is required before redeployment? Treat model updates as new deployments with their own validation requirements.
Tradeoff intensity by characteristic pair
Characteristic A
Tension level
Characteristic B
Accuracy / Valid
High
Explainable
Privacy
High
Fair / Bias
Fairness
High
Accuracy
Transparent
Med
Secure
Safe
Med
Autonomy
Reliable
Med
Adaptable
Privacy
Low
Transparent
The navigation principle

Context determines which characteristics to prioritise

There is no universal ranking of the seven characteristics — priority depends entirely on context. A medical diagnostic AI where errors can kill people must prioritise safety and accuracy above interpretability. A hiring AI where bias perpetuates systemic inequality must prioritise fairness above accuracy. A fraud detection system where adversarial attack is likely must prioritise security above transparency. The skill is knowing which context you are in, which characteristics it demands, and how to justify that prioritisation to regulators, clients, and the public.

The relative priority of the seven characteristics shifts dramatically depending on where an AI system is deployed. The same model used in different sectors may require entirely different governance approaches.

Healthcare

Medical diagnosis & clinical decision support

Systems that influence clinical decisions carry the highest possible stakes — errors can directly harm or kill patients. Regulators (FDA, MHRA) treat clinical AI as medical devices, requiring extensive validation.

Primary: Valid & Reliable, Safe
Critical: Explainable (clinicians must understand recommendations to override safely)
Significant tension: Accuracy vs Explainability — deep learning outperforms interpretable models but cannot be explained to clinicians
Financial services

Credit scoring, fraud detection, trading

Financial AI affects access to credit, insurance, and financial services — with direct impacts on economic inclusion. Also a high-value target for adversarial attack.

Primary: Fair with Bias Managed, Secure & Resilient
Critical: Accountable & Transparent (regulatory requirement in most jurisdictions)
Key requirement: Right to explanation — individuals must be able to understand adverse decisions
Criminal justice

Risk assessment, predictive policing, sentencing

AI in criminal justice directly affects liberty. The COMPAS case demonstrated that high overall accuracy can mask severe racial disparities in error rates. The most ethically contested AI domain.

Primary: Fair with Bias Managed, Accountable & Transparent
Critical: Explainable & Interpretable (defendants must be able to challenge decisions)
Key question: Should AI be used here at all? Some jurisdictions are banning it outright
Employment & HR

CV screening, performance evaluation, promotion

Employment AI is Annex III high-risk under the EU AI Act. Amazon's scrapped hiring tool demonstrated how historical data encodes historical discrimination. High regulatory and reputational exposure.

Primary: Fair with Bias Managed, Valid & Reliable
Critical: Accountable & Transparent (EU AI Act requires human oversight and appeals)
Key requirement: Bias audits before deployment — NYC Local Law 144 mandates this
Critical infrastructure

Energy grids, water systems, transport networks

AI controlling or optimising critical infrastructure must withstand adversarial attack, environmental disruption, and unexpected failure modes — often with no human in the loop for millisecond-level decisions.

Primary: Safe, Secure & Resilient, Valid & Reliable
Critical: Reliability over time — degradation must be detected before failure
Key requirement: Graceful degradation — must fail safely, not catastrophically
Generative AI (enterprise)

LLMs deployed for internal use, customer-facing tools

Generative AI introduces novel trustworthiness challenges — hallucination, prompt injection, copyright infringement, identity disclosure. Different threat model from traditional ML.

Primary: Valid & Reliable (hallucination), Accountable & Transparent (AI disclosure)
Critical: Secure (adversarial prompts, data leakage), Privacy (training data exposure)
Unique challenge: Outputs are probabilistic — traditional accuracy metrics are insufficient
The universal requirement

Regardless of sector — document your prioritisation

Whatever sector you are in, whatever tradeoffs you have made, the governance requirement is the same: document which characteristics you have prioritised, why, what the alternatives were, and how you will monitor the consequences of your choice. Undocumented tradeoff decisions are governance failures waiting to become liability events. The question from a regulator or a judge will be: "How did you decide?" — and "we optimised the model for accuracy" is not an adequate answer.

The precise vocabulary of AI trustworthiness — the terms that distinguish careful, informed analysis from surface familiarity.

Trustworthiness
A social concept, not a technical property
NIST defines trustworthiness as a social concept ranging across a spectrum — only as strong as its weakest characteristic. An AI system is not trustworthy by virtue of technical performance alone; it must demonstrate appropriate balance across all seven characteristics in its specific deployment context. Trustworthiness is perceived and maintained through ongoing engagement with affected stakeholders, not asserted by developers.
Validity
Doing what it's supposed to do
Confirmation that requirements for a specific intended use have been fulfilled. A system is valid if it genuinely does what it was designed to do in its actual deployment context — not just in the conditions under which it was trained and tested. Validity requires external validity (generalisation beyond training conditions) as well as internal accuracy.
Robustness
Performance under pressure and variation
The ability to maintain performance across a variety of conditions — including unexpected inputs, distribution shifts, adversarial examples, and use cases not initially anticipated. Robustness is distinct from accuracy: a system can be highly accurate under expected conditions and fail catastrophically outside them. Robust systems degrade gracefully rather than failing suddenly.
Explainability
How — the mechanism
A representation of the mechanisms underlying an AI system's operation. Answers "how did the system arrive at this output?" Distinct from interpretability (which asks "what does this output mean?"). Explainability enables debugging, auditing, and human oversight of the decision-making process — not just its outcome.
Interpretability
Why — the meaning in context
The meaning of an AI system's output in the context of its designed functional purpose. Answers "what does this output mean for the decision being made?" A technically explainable output may be meaningless without domain expertise to interpret it. Both explainability and interpretability are required for genuine human understanding of AI decisions.
Accuracy-Interpretability Tradeoff
The fundamental tension in ML
The empirically documented tendency for more accurate models to be less interpretable. Deep neural networks and ensemble methods achieve superior predictive performance but operate as black boxes. Decision trees and linear models are fully interpretable but sacrifice accuracy. This tradeoff drives one of the most consequential design decisions in AI development — and must be made explicitly based on the stakes of the application.
Adversarial Examples
Crafted inputs designed to fool AI
Carefully constructed inputs that cause an AI system to produce confident but incorrect outputs — while appearing indistinguishable from normal inputs to a human observer. A classic example: an image of a panda with imperceptible noise added causes an image classifier to confidently identify a gibbon. Adversarial robustness — resistance to such attacks — is a core security requirement for high-stakes AI systems.
Data Poisoning
Corrupting the training process
A security attack in which adversaries inject malicious data into a model's training set to manipulate its behaviour post-deployment. Unlike adversarial examples (which attack deployed models), data poisoning attacks the training pipeline. Particularly concerning for models trained on crowdsourced or web-scraped data, where the training data provenance cannot be fully verified.
Privacy by Design
Privacy embedded from the start
The principle that privacy protections should be built into AI systems during design and architecture — not added as an afterthought at deployment. Proactive rather than reactive: identifying and eliminating privacy risks before they materialise. Mandated by GDPR as a legal requirement for systems processing personal data. Significantly more effective and less costly than retrofitting privacy controls onto deployed systems.
ALTAI
EU self-assessment checklist
Assessment List for Trustworthy AI — the European Commission's self-assessment checklist for organisations building or deploying AI systems. Operationalises the EU's seven requirements for trustworthy AI (human agency, robustness, privacy, transparency, diversity, societal wellbeing, accountability) into a practical evaluation tool. Available free from the EU AI HLEG. A useful complement to the NIST AI RMF for organisations seeking EU alignment.

The conversations clients have when they start thinking seriously about trustworthy AI — and the answers that demonstrate you understand both the depth and the practical implications.

When they ask: "How do we know if our AI is trustworthy?"
Trustworthiness isn't a binary — it's a spectrum across seven characteristics, and you can be strong on some while being weak on others. The honest starting point is a structured assessment against each characteristic: Is it valid and reliable — do you have disaggregated performance metrics, not just overall accuracy? Is it safe — have you defined failure modes and tested them? Is it explainable — can the people who need to understand it actually do so? Is it fair — have you measured disparate impact across demographic groups? Is it private — have you assessed what the system can infer about individuals, not just what data it processes? And crucially — is it transparent and accountable — is there a named person responsible for each system, with documentation that regulators could review? Most organisations find gaps in at least three or four of these when they look honestly.
When they say: "We just need our AI to be accurate — the other stuff is secondary."
Accuracy is the foundation — but it's necessary, not sufficient. Consider what "accurate" means for a credit scoring model that's 95% accurate overall but has a 30% error rate for one demographic group. It's accurate in aggregate and discriminatory in practice. Or a medical AI that performs brilliantly on training data but fails on patients from hospitals with different data recording practices — technically accurate, but not valid in the deployment context. And accuracy alone tells you nothing about what happens when the system is wrong — whether it fails safely, whether there's a human who can catch and correct errors, whether affected individuals have any recourse. Accuracy is where you start, not where you stop.
When they ask: "Our model is a black box — does that mean it can't be trustworthy?"
Not necessarily, but it depends entirely on the stakes of the application. A black-box recommendation engine for a streaming service carries very different implications than a black-box system scoring job applicants. For lower-stakes applications, post-hoc explainability methods — LIME, SHAP — can provide sufficient transparency about individual decisions without needing to open the model itself. For high-stakes applications affecting rights, safety, or significant financial outcomes, the bar is higher: regulators and courts increasingly expect systems whose decisions can be meaningfully explained and challenged. The EU AI Act requires explainability for high-risk systems. The question to ask is: if this system makes a wrong decision, can the affected individual understand why and challenge it? If the answer is no, you may have a rights problem as much as a technical one.
When they ask: "Which characteristic should we focus on first?"
Start with validity and reliability — if the system doesn't work, nothing else is worth discussing. Then assess the characteristics most material to your specific context and risk profile. For healthcare AI, safety and explainability are usually most urgent. For hiring and credit, fairness and transparency. For critical infrastructure, security and resilience. The one characteristic that should always run in parallel regardless of context is accountability and transparency — because without it, you cannot assess or maintain any of the others over time. You need visibility into what the system is doing before you can determine whether it is safe, fair, or reliable in production. The most common mistake is treating trustworthiness as a launch-time checklist rather than an ongoing operational discipline.
When they push back: "This sounds like it will slow everything down."
The evidence says the opposite. Organisations that build trustworthiness in from the start deploy AI faster at scale — because they're not constantly firefighting incidents, reworking deployments that failed regulatory review, or managing the reputational fallout from high-profile failures. McKinsey's research shows organisations with embedded responsible AI governance see up to 40% higher ROI from AI investments, largely due to reduced rework and audit costs. The AI systems that get blocked or reversed — by regulators, by client procurement requirements, by board risk appetite — are almost always the ones where trustworthiness was an afterthought. Building it in is the fast path, not the slow one.

The framing that resonates

Trustworthiness is the licence to scale

Frame it not as a constraint but as the precondition for ambitious AI deployment. Organisations that can demonstrate trustworthiness across all seven characteristics can move faster, deploy more broadly, and access markets where untrustworthy AI is being blocked. Trustworthiness is not the opposite of capability — it is what makes capability sustainable. The most powerful AI systems in enterprise will be the ones people actually trust to make consequential decisions.

Trustworthiness is the concept that ties everything else together. These questions test whether you can apply it — not just define it.

Name all seven trustworthiness characteristics — and identify which is the foundation and which is cross-cutting.
The seven: Valid & Reliable, Safe, Secure & Resilient, Accountable & Transparent, Explainable & Interpretable, Privacy-Enhanced, Fair with Harmful Bias Managed.

Foundation: Valid & Reliable — without this, no other characteristic matters. A system that doesn't work cannot be trustworthy, regardless of how transparent or fair it is.

Cross-cutting: Accountable & Transparent — it touches all the others. You cannot assess or maintain any other characteristic over time without transparency into what the system is doing. Without it, you cannot determine whether the system is fair, safe, or reliable in production.
What is the difference between transparency, explainability, and interpretability? Give the question each one answers.
Transparency answers: "What happened?" — system-level visibility into what the AI did, when, and under what conditions. The audit trail.

Explainability answers: "How?" — the mechanism behind a specific output. How did the system get from this input to this output? LIME and SHAP are explainability tools.

Interpretability answers: "Why does this matter?" — the meaning of an output in context. What does a risk score of 7/10 actually mean for this particular decision, for this particular person?

You need all three for genuine human oversight. Transparency without explainability tells you something happened but not why. Explainability without interpretability tells you the mechanism but not the significance. All three are required for an affected individual to meaningfully understand and challenge a decision.
What is the accuracy-interpretability tradeoff — and when should interpretability win?
The most accurate models (deep neural networks) are the least interpretable. Simpler models (decision trees, linear regression) are fully interpretable but sacrifice predictive accuracy. This is empirically documented — not a temporary technical limitation to be solved, but an inherent property of model complexity.

Interpretability should win when: the cost of an unexplainable wrong decision exceeds the benefit of slightly higher accuracy; when affected individuals have a legal or ethical right to understand decisions made about them; when the domain requires human clinical or professional judgement to be preserved (medical, legal); and when the EU AI Act requires explainability for a high-risk system. A medical diagnosis model that is 2% more accurate but cannot be explained to clinicians may be less trustworthy — and less safe — than one that is 2% less accurate but allows clinicians to meaningfully override it.
A healthcare provider deploys an AI diagnostic tool. Which trustworthiness characteristics matter most — and what is the key tradeoff they will face?
Medical AI where errors can directly harm or kill patients. Clinicians must be able to understand and override recommendations. Regulators treat it as a medical device.
Primary: Valid & Reliable and Safe — the absolute non-negotiables. A system that misdiagnoses must fail safely, not silently.
Critical: Explainable & Interpretable — clinicians need to understand recommendations well enough to override them when their clinical judgement differs. A black-box diagnostic tool that overrides clinical judgement is dangerous even if technically accurate.
Key tradeoff: The most accurate diagnostic models are often the least explainable. Deep learning models in radiology outperform radiologists on narrow benchmarks but cannot explain their reasoning. The governance decision: is 2% more accuracy worth removing the clinician's ability to meaningfully review the recommendation? In most healthcare contexts, the answer is no — the explainability requirement constrains the accuracy that can be deployed.
A privacy-enhancing technique reduces the AI system's accuracy for one demographic group. Which two trustworthiness characteristics are in tension — and how do you navigate it?
Differential privacy adds noise to model outputs to prevent individual re-identification. That noise disproportionately degrades prediction quality for underrepresented groups.
Privacy-Enhanced vs Fair with Bias Managed — one of the most common and least understood trustworthiness tradeoffs.

Navigation: first, assess which group is most affected by the accuracy reduction and whether the reduction is proportionate — does the privacy benefit justify the fairness cost for this specific group? Second, explore whether the privacy technique can be applied in ways that distribute the accuracy cost more evenly (e.g., applying noise differently to majority and minority group data). Third, document the tradeoff explicitly: who made the decision, what the alternatives were, and what the residual fairness impact is. This documentation is both an ethical requirement and a regulatory one — the EU AI Act requires high-risk AI systems to demonstrate that tradeoffs between characteristics have been considered and justified.

The foundational documents, standards, and research behind this guide on AI trustworthiness.

Primary sources
Research & tools