Preparing AI for the unexpected: Lessons from recent IT incidents

Can your AI models survive a big disaster? While a recent major IT incident with CrowdStrike wasn't AI related, the magnitude and reaction reminded us that no system no matter how proven is immune to failure.

AI modeling systems are no different. Neglecting the best practices of building models can lead to unrecoverable failures. Discover how the three-tiered framework of robustness, resiliency, and anti-fragility can guide your approach to creating AI infrastructures that not only perform reliably under stress but also fail gracefully when the unexpected happens.
‍

Show Notes

Technology, incidents, and why basics matter (00:00:03)

While the recent Crowdstrike incident wasn't caused by AI, it's impact was a wakeup call for people and processes that support critical systems (Forbes, July 2024)
As AI is increasingly being used at both experimental and production levels, we can expect AI incidents are a matter of if, not when. What can you do to prepare?

The "7P's": Are you capable of handling the unexpected? (00:09:05)

The 7Ps is an adage, dating back to WWII, that aligns with our "do things the hard way" approach to AI governance and modeling systems.
Let’s consider the levels of building a performant system: Robustness, Resiliency, and Antifragility

Model robustness (00:10:03)

Robustness is a very important but often overlooked component of building modeling systems. We suspect that part of the problem is due to:
- The Kaggle-driven upbringing of data scientists
- Assumed generalizability of modeling systems, when models are optimized to perform well on their training data but do not generalize enough to perform well on unseen data.

Model resilience (00:16:10)

Resiliency is the ability to absorb adverse stimuli without destruction and return to its pre-event state.
In practice, robustness and resiliency, testing, and planning are often easy components to leave out. This is where risks and threats are exposed.
See also, Episode 8. Model validation: Robustness and resilience

Models and antifragility (00:25:04)

Unlike resiliency, which is the ability to absorb damaging inputs without breaking, antifragility is the ability of a system to improve from challenging stimuli. (i.e. the human body)
A key question we need to ask ourselves if we are not actively building our AI systems to be antifragile, why are we using AI systems at all?

‍

Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

LinkedIn - Episode summaries, shares of cited articles, and more.
YouTube - Was it something that we said? Good. Share your favorite quotes.
Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.

‍