A common area of confusion in data science is how monitoring and governance are related to one another. With the emergence of MLOps as a separate field, and its adoption of DevOps principles for deploying machine learning models, some key principles of model governance have been lost. There is a tendency to believe a model is being governed if model drift and feature drift detection are enabled.
Although MLOps monitoring is a crucial part of governance, monitoring is not governance by itself, nor is it valuable in isolation. MLOps also often lacks the necessary nuance and expertise required for responsible deployment of modeling systems and providing tangible value. Let’s explore those nuances and identify exactly what is missing from MLOps monitoring that is essential for governance.
In the democratization of data science, the expertise of statisticians, economists, aerospace engineers, actuaries, and other professions has at times been overlooked as outdated in the modern data-driven world. Mainstream outlets and industries have distanced themselves from more traditional expertise, enticed by the promise of AI-enhanced productivity and empowering data scientists to experiment and quickly deploy machine learning models, bypassing traditional validation methods and theoretical rigor.
With the increasing number of data scientists creating AI solutions driven by machine learning, a dangerous trend has emerged in the pursuit of ethical and accountable AI. There is a tendency to embrace the "move fast and break things" mentality when developing mission-critical systems that impact end users' lives, without adhering to modeling best practices and objective reviews or validation.
It is worth noting that machine learning is about learning from data without a prior understanding of the relationship between inputs and outputs. In data science, practitioners are tasked with obtaining insights from data, which can often encourage chasing after correlation, which does not equal causality.
Along those lines in the “move fast” mentality, monitoring has become an easy "fix" to assume accountability for model outcomes, without addressing the root issues. However, this type of monitoring creates a false sense of security by serving as a checkbox for governance without critically evaluating the underlying methodologies and modeling practices that contribute to outcomes.
As discussed in previous posts, effective AI governance requires a holistic, lifecycle-driven approach that applies interdisciplinary systems engineering and proven practices from model risk management [1][2][3]. Proper model risk management, and by extension AI governance, necessitates:
Often, we seek instant gratification and skip the necessary steps to achieve our goals. Independent, objective, and effective model validation is a common practice in other disciplines but often practices infrequently within data science. It is worth noting that if you are currently using AI models that impact consumers, your AI is already regulated, as recently mentioned by the FTC [4].
Model validation involves objective and independent processes and activities to verify if models are performing as expected and aligned with their design objectives and business uses[2][3]. Similar to other aspects of effective challenge, model validation should be conducted by staff with appropriate incentives, competence, and influence.
Evaluating the quality and extent of model documentation.
Reviewing whether the model achieves its intended purpose.
Comparing alternative model theories and approaches:
Reviewing key assumptions and variables.
Reperforming the model, conducting sensitivity analysis, and stress testing:
Ongoing monitoring:
For further discussion about model validation, listen to the episode about model validation in its entirety on The AI Fundamentalists podcast.
As discussed in our podcast "The AI Fundamentalists," there are very few shortcuts to building mission-critical, robust, and resilient systems. Proper deployment of modeling systems requires intensive model validation to ensure safety, performance, and resilience, and to establish safe operating boundaries. Once these boundaries are established, ongoing monitoring should be implemented as guardrails to ensure that the modeling system performs as expected. By establishing these guardrails, drift detection becomes meaningful and contextual. Implementing monitoring without determining what should be monitored diminishes its value, and when drift is detected, it becomes challenging to interpret.
Accountable, fair, and performant AI systems are possible if we learn from history and incorporate interdisciplinary best practices. We encourage data science leaders to take pride in their teams' commitment to go above and beyond, ensuring that the systems they produce are safe, performant, resilient, and aligned with business objectives.
For more information about Monitaur's holistic AI governance and our Flight Simulator model validation suite, please contact us.
For more in-depth comparisons between MLOps and model governance, see MLOps system design has a governance problem.
[1]: Shea, Garrett. “NASA Systems Engineering Handbook Revision 2.” NASA, 20 June 2017, http://www.nasa.gov/connect/ebooks/nasa-systems-engineering-handbook.
[2]: “The Fed - Supervisory Letter SR 11-7 on Guidance on Model Risk Management -- April 4, 2011.” Accessed August 9, 2023. https://www.federalreserve.gov/supervisionreg/srletters/sr1107.htm.
[3]: OCC.gov. “Model Risk Management: New Comptroller’s Handbook Booklet,” August 18, 2021. https://occ.gov/news-issuances/bulletins/2021/bulletin-2021-39.html.
[4]: FTC business guidance: https://www.ftc.gov/business-guidance/blog/2020/04/using-artificial-intelligence-and-algorithms and https://www.ftc.gov/business-guidance/blog/2023/02/keep-your-ai-claims-check