MLOps monitoring isn't necessarily model governance

A common area of confusion in data science is how monitoring and governance are related to one another. With the emergence of MLOps as a separate field, and its adoption of DevOps principles for deploying machine learning models, some key principles of model governance have been lost. There is a tendency to believe a model is being governed if model drift and feature drift detection are enabled.

Although MLOps monitoring is a crucial part of governance, monitoring is not governance by itself, nor is it valuable in isolation. MLOps also often lacks the necessary nuance and expertise required for responsible deployment of modeling systems and providing tangible value. Let’s explore those nuances and identify exactly what is missing from MLOps monitoring that is essential for governance.

Machine learning and monitoring

In the democratization of data science, the expertise of statisticians, economists, aerospace engineers, actuaries, and other professions has at times been overlooked as outdated in the modern data-driven world. Mainstream outlets and industries have distanced themselves from more traditional expertise, enticed by the promise of AI-enhanced productivity and empowering data scientists to experiment and quickly deploy machine learning models, bypassing traditional validation methods and theoretical rigor.

With the increasing number of data scientists creating AI solutions driven by machine learning, a dangerous trend has emerged in the pursuit of ethical and accountable AI. There is a tendency to embrace the "move fast and break things" mentality when developing mission-critical systems that impact end users' lives, without adhering to modeling best practices and objective reviews or validation.

It is worth noting that machine learning is about learning from data without a prior understanding of the relationship between inputs and outputs. In data science, practitioners are tasked with obtaining insights from data, which can often encourage chasing after correlation, which does not equal causality.

Along those lines in the “move fast” mentality, monitoring has become an easy "fix" to assume accountability for model outcomes, without addressing the root issues. However, this type of monitoring creates a false sense of security by serving as a checkbox for governance without critically evaluating the underlying methodologies and modeling practices that contribute to outcomes.

As discussed in previous posts, effective AI governance requires a holistic, lifecycle-driven approach that applies interdisciplinary systems engineering and proven practices from model risk management ^[1][2][3]. Proper model risk management, and by extension AI governance, necessitates:

A deep understanding of the specific business problem to be solved
Understanding of the desired outcomes
Comprehensive understanding and documentation of the data
In-depth knowledge of modeling paradigms and the most suitable algorithms for the given use case
Independent, objective, and effective validation to establish the modeling system’s safe and performant operating ranges, which can then be monitored against.

Often, we seek instant gratification and skip the necessary steps to achieve our goals. Independent, objective, and effective model validation is a common practice in other disciplines but often practices infrequently within data science. It is worth noting that if you are currently using AI models that impact consumers, your AI is already regulated, as recently mentioned by the FTC ^[4].

There’s monitoring, and then there’s model validation

Model validation involves objective and independent processes and activities to verify if models are performing as expected and aligned with their design objectives and business uses^[2][3]. Similar to other aspects of effective challenge, model validation should be conducted by staff with appropriate incentives, competence, and influence.

A sound model validation process includes

Evaluating the quality and extent of model documentation.

Reviewing whether the model achieves its intended purpose.

Comparing alternative model theories and approaches:

Is the chosen approach appropriate? Could a simpler model be used?

Reviewing key assumptions and variables.

Reperforming the model, conducting sensitivity analysis, and stress testing:

These are crucial for determining what should be monitored

Ongoing monitoring:

For further discussion about model validation, listen to the episode about model validation in its entirety on The AI Fundamentalists podcast.

‍

Monitoring: MLOps vs AI Governance

Characteristics of MLOps Monitoring	Characteristics of AI Governance Monitoring
Putting monitoring in place to appease risk management	Documenting the model to a level that a competent third party can understand the decision-making process and reproduce the model
Attempting to configure drift monitoring across versions of a model	Thoroughly reviewing and independently and objectively validating the model at a commensurate level of risk
Uncertainty in interpreting or configuring model drift	Stress testing the models to determine failure points
Uncertainty about the business value	Establishing acceptable model performance and distribution ranges
Uncertainty about the difference between monitoring and logging	Configuring ongoing monitoring to ensure the model operates within acceptable ranges

Interdisciplinary best practices yield high-performing models

As discussed in our podcast "The AI Fundamentalists," there are very few shortcuts to building mission-critical, robust, and resilient systems. Proper deployment of modeling systems requires intensive model validation to ensure safety, performance, and resilience, and to establish safe operating boundaries. Once these boundaries are established, ongoing monitoring should be implemented as guardrails to ensure that the modeling system performs as expected. By establishing these guardrails, drift detection becomes meaningful and contextual. Implementing monitoring without determining what should be monitored diminishes its value, and when drift is detected, it becomes challenging to interpret.

Accountable, fair, and performant AI systems are possible if we learn from history and incorporate interdisciplinary best practices. We encourage data science leaders to take pride in their teams' commitment to go above and beyond, ensuring that the systems they produce are safe, performant, resilient, and aligned with business objectives.

For more information about Monitaur's holistic AI governance and our Flight Simulator model validation suite, please contact us.

‍

For more in-depth comparisons between MLOps and model governance, see MLOps system design has a governance problem.

‍

References

[1]: Shea, Garrett. “NASA Systems Engineering Handbook Revision 2.” NASA, 20 June 2017, http://www.nasa.gov/connect/ebooks/nasa-systems-engineering-handbook.

[2]: “The Fed - Supervisory Letter SR 11-7 on Guidance on Model Risk Management -- April 4, 2011.” Accessed August 9, 2023. https://www.federalreserve.gov/supervisionreg/srletters/sr1107.htm.

[3]: OCC.gov. “Model Risk Management: New Comptroller’s Handbook Booklet,” August 18, 2021. https://occ.gov/news-issuances/bulletins/2021/bulletin-2021-39.html.

[4]: FTC business guidance: https://www.ftc.gov/business-guidance/blog/2020/04/using-artificial-intelligence-and-algorithms and https://www.ftc.gov/business-guidance/blog/2023/02/keep-your-ai-claims-check

‍