Breaking down model bias in AI

Ethics & Responsibility
AI Governance & Assurance
Bias in AI

Making ethical decisions with machine learning algorithms

As one of my favorite professors used to say: “things are complex.” What a simple but profound truth. And, nothing could be more true when it comes to AI bias. The complexity of algorithmic bias is one of the reasons why it is a major discussion point in business today. 

More and more decisions in our lives are being decided by algorithms - from whether or not we’re qualified for a job to what clothes we buy to the medical treatment we receive. The importance of ensuring that these automated decisions are fair and ethical is reaching a greater urgency. Most of us probably agree on that, yet how we address this task of assuring* fair and ethical algorithmic decision-making remains a gray area. What does an ethical decision really mean? 

Dealing with discrimination

Here is an example from the insurance industry. Insurance companies have always used what is called “fair discrimination” to conduct business. They use statistical analysis to determine the risk involved in approving a person or company’s policy - and generally put prospects into two categories: low risk and high risk.

What is increasingly being called to question in insurance (and across industries) is unfair discrimination. One of the struggles with addressing this issue is that companies are also limited in the information they are allowed to collect from consumers. Insurance companies, for example, can’t legally require that consumers provide information on their gender and race. This is meant to protect consumers when it comes to bias and discrimination. 

The problem with not having information on gender and race means that an automated algorithm making business decisions (whether someone is approved for an insurance policy for example) doesn’t take this information into account either. The data used by models is often already biased because it was generated by humans with their own unconscious and conscious biases. Moreover, we’re learning that data becomes biased anyway. So the question is, do we need data on race and gender to properly train algorithms and machine learning (and mitigate bias)?

What happens if an insurance carrier’s risk exposure shows some groups of people to be less qualified for coverage? Should carriers give policies to a randomly selected group of otherwise unqualified individuals to meet a statistical requirement of racial parity?  Or should companies take into account differences such as credit score, and employment history - which we know can also infer information regarding gender and race? 


Things are complex. These are not easy questions to answer.

Webinar replay

Measuring fairness in AI

Within the academic community around algorithmic fairness, there are over 15[1] different fairness metrics, which begs another question: which should we be using? Below are a couple of examples:

  • Disparate Impact is currently the prevailing legal standard for determining bias. Developed in 1971 by the State of California's Fair Employment Practice Commission, this method is flawed and does not take into account an individual’s qualifications for a job or policy. 
  • Equalized Odds[2] is the current “gold standard” in academia. However, it is impractical to use in many industries and does not have an agreed-upon threshold for compliance/non-compliance with this method. 

Both of these frameworks optimize for a different definition of fairness. And, regardless, without having gender and ethnicity data, neither can effectively be calculated.

Many companies today have responded to the question of bias and fairness with their AI with Fairness Through Unawareness. In other words, their answer has often been not to record an individual's ethnicity or gender information, so plausible deniability exists around fairness. However, the problem with this approach is that just because you can't see something doesn't mean it doesn't exist.

On a technical level, data has many intercorrelations and dependencies. Removing gender or ethnicity information does not necessarily remove the correlations and interdependencies between the data columns. Or, if this data is never gathered in the first place, that doesn’t mean that your data and subsequent model(s) are unbiased. It only means you don’t know.

To infer protected class information, such as gender, several techniques have been developed in an attempt to estimate these attributes so we can evaluate models for bias without collecting sensitive information. 

Arguably the most popular and widely used among these proxy inference techniques is Bayesian Improved Surname Geocoding (BISG). BISG uses US Census data to infer ethnicity based on an individual's name and zip code. However, BISG is problematic because we are inferring ethnicity from data that is not entirely accurate. The US Census data also now uses differential privacy to obfuscate individuals, which takes us further away from accuracy. 

So with BISG, ethnicity is inferred from several layers of abstraction. Plus, obviously, someone's last name and zip code are not really accurate ways to determine someone’s ethnicity. America is a diverse melting pot of individuals, heritages, and nationalities. 

Inferring a certain ethnicity from a name and zip code is 'data-driven stereotyping'. Proxy-based inference tools such as BISG[3] remain controversial[4], with important studies showing that they lead to biased disparity assessments.

Addressing AI bias

To be clear, I’m not personally trying to prescribe what the universal definition of fairness should be. But I am advocating for:

  • A better understanding of the complexities of algorithmic bias in AI among both technical and non-technical stakeholders.
  • The collection of data on gender and ethnicity in industries that require assurance against bias, for example, the mortgage industry^.
  • Holistic, lifecycle approaches to model bias mitigation that go beyond surface level monitoring and “checking a box.”

This post is the first of a series on algorithmic bias, to illuminate the complexities, open questions, metrics, mitigation techniques, and holistic frameworks available to address bias in AI. 

Stay tuned for more information on: 

  • Some of the most popular—and conflicting—fairness metrics used today 
  • Two newly developed techniques for mitigating AI bias
  • A holistic, lifecycle approach to AI governance (managing bias is only one step)

To learn more, watch a replay of our online event: Unraveling AI bias - a technical exploration.

About the author

Dr. Andrew Clark is Monitaur’s co-founder and Chief Technology Officer. A trusted domain expert on the topic of ML auditing and assurance, Andrew built and deployed ML auditing solutions at Capital One. He has contributed to ML auditing education and standards at organizations including ISACA and ICO in the UK. He currently serves as a key contributor to ISO AI Standards and the NIST AI Risk Management framework. Prior to Monitaur, he also served as an economist and modeling advisor for several very prominent crypto-economic projects while at Block Science.

Andrew received a B.S. in Business Administration with a concentration in Accounting, Summa Cum Laude, from the University of Tennessee at Chattanooga, an M.S. in Data Science from Southern Methodist University, and a Ph.D. in Economics from the University of Reading. He also holds the Certified Analytics Professional and American Statistical Association Graduate Statistician certifications. Andrew is a professionally trained concert trumpeter and Team USA triathlete.


Notes and citations

Definition of an algorithm as discussed in this post: heuristic or mathematical functions used for decision making

*Assure, or Assurance is a term from auditing that describes an independent, third-party providing an objective review that the system or process under question is operating as intended.

^US Home Mortgage Disclosure Act (HMDA) allows lenders to collect gender and ethnicity information for mortgage applicants. This provides the data required to properly evaluate for bias

[1] Castelnovo, Alessandro, Riccardo Crupi, Greta Greco, and Daniele Regoli. “The Zoo of Fairness Metrics in Machine Learning.” ArXiv:2106.00467 [Cs, Stat], December 13, 2021. http://arxiv.org/abs/2106.00467.

[2]Hardt, Moritz, Eric Price and Nathan Srebro. “Equality of Opportunity in Supervised Learning.” ArXiv  abs/1610.02413 (2016): n. pag.

[3]“Using Publicly Available Information to Proxy for Unidentified Race and Ethnicity.” Consumer Financial Protection Bureau , https://www.consumerfinance.gov/data-research/research-reports/using-publicly-available-information-to-proxy-for-unidentified-race-and-ethnicity/. Accessed 26 July 2022.

[4]Zhang Y (2016) Assessing fair lending risks using race/ethnicity proxies. Management Science 64(1):178–197