AssuranceOps: Building Confidence in Evolving AI

Most Machine Learning (ML) teams today can reliably release models using Machine Learning Operations or MLOps ¹: reproducible training, automated deployments, and continuous monitoring for drift and performance regression. These capabilities are essential, but in high-consequence domains, they are not enough. A system that works is not automatically a trustworthy one.

In these settings, stakeholders, whether responsible for safety, performance or regulation, need more than operational reliability. They need justified, structured confidence that the system is appropriate for its intended use, and that it remains appropriate as it evolves. Demonstrating this requires more than metrics or monitoring dashboards. It requires a trust asset: a transparent, evidence-based argument that explains why the system can be relied upon.

This is what assurance provides. And this is where traditional MLOps leaves a gap. MLOps is about how systems are built, validated, deployed, and monitored. Assurance shows whether the system is suitable to use and proves it with a high degree of confidence. Assurance cases are the established method for building justified confidence, but when produced late and maintained manually in the operational phase, they are often updated sporadically and no longer adequately represent the system.

As a result, a gap emerges between the system and its assurance in which the assurance case no longer reflects the knowledge necessary to justify confidence in how the system behaves. AssuranceOps closes that gap by integrating assurance activities into the automated, continuous operating model of MLOps. It turns assurance into an iterative and interactive process that evolves alongside the system itself. Rather than a one-off document or static review, AssuranceOps produces and maintains the trust asset throughout the AI lifecycle.

Achieving an Adequate Level of Trustworthiness is Harder for AI-enabled Systems than for Traditional Software

AI is reshaping industries, from autonomous shipping to healthcare diagnostics, unlocking major efficiency and innovation. But AI introduces unique challenges such as data distribution shifts, frequent retraining cycles, and opaque decision-making. While risk evolves for all systems, including traditional software, AI introduces distinctive challenges that amplify uncertainty and demands iterative assurance.

Pre-deployment testing alone rarely satisfies stakeholders accountable for safety, regulatory compliance (i.e. ensuring the system meets applicable regulatory requirements), or mission performance. They require evidence that the system is acceptable at deployment and remains so as it evolves.

Why AI Engineers Should Adopt Assurance

If you are developing AI for safety-critical applications, you are likely already doing many of the right things: automated evaluation, canary deployment, drift monitoring, rollback capability, and incident response. MLOps excels at operational reliability but typically does not answer questions stakeholders care about when risk is high:

What are the explicit safety, security, and compliance claims for this system?
What are the risks, and how can they be mitigated?
What evidence supports each claim, and is it still valid after the latest data refresh or model update?
Which changes require re-verification, and what evidence can be reused?
How certain are we that the system is safe?

This is why building, deploying, and running a model reliably is not the same as demonstrating that the AI-enabled system is safe, secure, compliant, and fit for purpose in its operational context.

Consider a computer vision model used on autonomous ships to support situational awareness and detect potential obstacles. A strong MLOps setup can continuously train, validate, deploy, and monitor the model. Yet system-level hazards, arising from interactions across the ship, sensors, environment, and operators, can still arise because the real world changes.

These hazards may emerge from a range of operating conditions, such as:

Environmental variation: fog, glare, night operations, or other changes in visibility.
Sensor degradation: gradual wear that reduces image quality.
New operational contexts: new ports introducing unfamiliar objects and edge cases.
Human–AI interaction shifts: operators relying on automation in unexpected or unsafe ways.

Assurance identifies scenarios where these hazards can occur (e.g. reduced visibility due to fog) and makes explicit what the system must be able to handle in each scenario. In the fog example, this may mean explicitly articulating the claim the system must satisfy: "The ship can navigate safely when visibility drops". This claim illustrates the type of system-level expectations that assurance needs to justify. Assurance then helps break this down into the conditions the system must meet to make that possible. For example: "The ship can navigate safely in fog because the vision model handles reduced visibility and the autonomous navigation system remains robust under those conditions."

Finally, assurance defines:

What evidence is needed to stay confident that the system continues to work as expected.
What information teams need to monitor and manage risks as the environment and system evolve.

Assurance of AI-Enabled Systems

A helpful way to understand assurance is to think of it as presenting a case in a court of law. But instead of proving someone’s innocence, the goal is to prove that an AI‑enabled system is appropriate for its intended use.

Just as a lawyer builds a case, an assurance case builds a structured argument:

It begins with a claim about what the system must be able to do in context (for example: “The ship can navigate safely in fog”).
It provides evidence that the claim holds (test results, simulation scenarios, robustness checks, monitoring logs, risk analysis).
It connects those pieces of evidence through a clear, transparent argument that explains why they justify confidence.

Assurance is not just about showing that the model works; it is about demonstrating, through reasoned arguments and traceable evidence, why the system is trustworthy in real-world operation, and under what conditions that trustworthiness remains valid.

This is precisely what DNV Recommended Practice DNV-RP-0671 ² formalizes. The Recommended Practice sets out a structured approach for assuring AI-enabled systems following these core principles:

Stakeholder focus: Identify and prioritize what matters to stakeholders, such as safety, fairness, and privacy, and translate these into claims.
Evidence-based argumentation: Build structured assurance cases that link claims to evidence, ensuring transparency and traceability.
Systems approach: Model system behaviour as an emergent property arising from the interactions between all elements of a complex system, including the AI component, and its environment.
Risk-based thinking: Focus assurance efforts where uncertainty and potential impact are greatest.
Modularity: Break down the assurance effort into modules for agility, efficiency, and scalability.
Lifecycle perspective: Iterate assurance throughout the system’s lifecycle, pre-deployment, and post-deployment.

These principles come together in the assurance process described in the Recommended Practice and depicted in Figure 1. The process starts by identifying stakeholder interests and translating them into system requirements and system-level claims. It then defines an assurance strategy, including an argumentation approach and the type of evidence needed to justify the resulting claims. The assurance process includes the following steps, carried out in accordance with the assurance principles:

Define the assurance basis: Capture system description, stakeholder interests, and initial claims.
Plan assurance activities: Select strategies to substantiate claims, considering risk and knowledge gaps.
Substantiate claims: Gather evidence through testing, simulation, and monitoring.
Evaluate risks: Assess whether confidence in claims is justified and adapt the system if needed.

Figure 1: The general assurance process as depicted in the Recommended Practice.

This process is iterative by design. For AI-enabled systems, where data shifts, retraining, and new operating conditions are expected, assurance must be revisited continuously.

AssuranceOps builds on the principles of DevOps and MLOps but adds the structure and discipline needed to maintain justified confidence throughout the full system lifecycle. While MLOps manages the recurring model centric pipeline, the lifecycle perspective in the Recommended Practice spans the entire AI enabled system, including context, risks, and interactions beyond the model.

AssuranceOps connects the two by embedding lifecycle assurance practices directly into MLOps workflows. Instead of treating assurance as a one off or static document, AssuranceOps keeps system expectations, risk analyses, and supporting evidence continuously updated with every change, as illustrated in Figure 2.

Figure 2: AssuranceOps links the MLOps pipeline with assurance by establishing an event-based, bidirectional flow of evidence and system expectations.

AssuranceOps: Bridging Assurance and MLOps

Traditional assurance often happens late, manually, and separately from development and operations. AssuranceOps closes this gap by connecting assurance activities to the MLOps loop, so that, alongside models and monitoring dashboards, it maintains assurance artifacts that stay up to date with every release, including:

mapping stakeholder expectations (safety, security, compliance) to system behaviour,
analysing risks and linking them to controls,
generating evidence-based justifications tied to specific expectations,
monitoring evidence over time,
assessing impact of change on trustworthiness.

This directly supports lifecycle-oriented expectations that are increasingly explicit for high-risk systems, such as continuous risk management and post-market monitoring ³.

AssuranceOps introduces an iterative, information- and event-driven loop of assurance activities that adhere to the assurance principles and run in parallel with development and operations. This means assurance is revisited whenever relevant events occur, such as system changes or detected problems, and whenever new information (evidence) becomes available that affects confidence in claims, ensuring that both the system and its assurance evolve together. This is not about adding paperwork, but about maintaining a clear, evidence-based assurance argument as part of normal delivery.

Here is a comparison of MLOps and assurance showing some integration points and how they complement each other:

Aspect	MLOps	Assurance
Primary goal	Efficient deployment and monitoring of ML models	Establishing and maintaining an adequate level of confidence in the trustworthiness of AI-enabled systems
Focus	Model lifecycle: design, develop, train, test, deploy, monitor	Assurance lifecycle: define, plan, substantiate, evaluate, adapt
Key activities	Data ingestion and preprocessing; Model training and validation; Deployment automation; Performance monitoring	Stakeholder analysis; Risk assessment; Assurance case development; Evidence collection; Risk evaluation
Outputs	Operational ML models; Monitoring metrics	Structured assurance cases; Justified confidence in claims
Integration points	Provides testing and monitoring evidence for assurance; Detects drift and triggers change and retraining	Uses evidence from MLOps; Informs design decisions by providing assurance evidence and insights aligned with risk-based requirements; Adds assurance modules to release pipeline
Tools & frameworks	CI/CD pipelines, model registries, monitoring dashboards	DNV-RP-0671, assurance case template, risk modelling tools
Value proposition	Faster deployment, scalability, reproducibility	Grounds for justified confidence in system trustworthiness, support lifecycle risk management, regulatory compliance

Benefits of AssuranceOps across the AI Engineering Lifecycle

AssuranceOps is fundamentally about creating confidence by reducing uncertainty where it matters most. It offers benefits across the AI engineering and operations lifecycle:

System trustworthiness: Demonstrate to customers, internal teams, and regulators that an AI-enabled system is trustworthy and responsibly managed.
Agility: Respond quickly to changes in technology, environment, or stakeholders' expectations.
Efficiency: Avoid costly re-certification by localizing assurance to affected modules.
Future-proofing: Align with evolving standards and ethical principles for responsible AI.

Assurance, together with risk assessment and risk evaluation, helps identify system-integration needs, dataset quality requirements, and dependencies for validating system-level performance (e.g. required test conditions, data, or integration checks), enabling AI teams to make better design choices sooner and reduce rework during development.

Conclusion

AI promises transformative benefits, but only if it can be shown to be trustworthy. AssuranceOps can offer a practical, scalable way to embed assurance into the AI lifecycle, ensuring grounds for justified confidence that the systems remain safe, secure, and compliant as they adapt to new challenges. Given the importance of system trustworthiness, enduring and interactive assurance is essential.

References

Google. MLOps: Continuous delivery and automation pipelines in machine learning. https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning ↩
DNV-RP-0671. Assurance of AI-enabled systems. https://www.dnv.com/digital-trust/recommended-practices/assurance-of-ai-enabled-systems-dnv-rp-0671/ ↩
EU AI Act - Article 9: Risk management system. https://ai-act-service-desk.ec.europa.eu/en/ai-act/article-9 ↩