Home » News » Breaking the Cycle of Data Bias

Breaking the Cycle of Data Bias


November 08, 2023 | Originally published on Empire Startups Fintech Newsletter

“Past performance is not indicative of future returns.” 

After over a decade of these words appearing at the bottom of every email in my previous banker life, this quote is etched into my brain.  And while the statement may seem simple enough, it is something often strangely missed in the world of AI.

You may have heard that AI and the subfield of machine learning solutions are only as good as the data you feed them; often referred to as the “garbage in, garbage out” problem. With the expected proliferation of these solutions, what can one do to ensure that “bad” data doesn’t engrain “bad” decisions into your organization? 

First, it is important to understand that there are many ways in which data can be bad and many places in which biases are introduced – by both humans and machines – in a process. Some features of bad data are more obvious, such as inaccurate or missing data, but others are harder to identify.  For this piece, we will focus on biases in data that make the data “bad”. 

1. Looking backward ≠ Looking ahead 

All data has the blind spot – or bias – of representing only what has happened in the past. In times of rapidly changing market conditions, such as the ones we are in today, how do you optimize decisions for the future? 

While data can provide valuable insights from past patterns, it cannot inherently adapt to unforeseen circumstances or net new environments. Government stimulus during COVID artificially made certain customer segments appear less risky than they actually turned out to be. Experts knew this context around the data before the data did.

2. Baked-In Bias

In a dataset, there’s often a desired outcome that you are predicting and optimizing decisions for: Your “source of truth.” However, the data usually lacks the context that may have yielded these outcomes. 

The Stratyfy team recently completed a study of 2021 HMDA mortgage data where we found that 1,117 black applicants were unjustly denied mortgages, equating to $387 million of capital that was withheld from these applicants in JUST ONE YEAR. These loan rejections could not be attributed to any of the applicants’ actual financial data that would be predictive of creditworthiness – and therefore can only be attributed to human bias. 

These biased decisions create a waterfall effect on those individuals and their communities, forcing them to resort to products with higher interest rates and/or hidden fees. Failing to consider the nature of this loan will incorrectly attribute the default to the characteristics of the borrowers, perpetuating that pattern again, and again, and again. 

3. Reading “Between” The Lines

We have more data today at our fingertips than we ever. But what does it lack? Context. Our data sets almost always lack at least one crucial variable that explains the outcome we’re interested in – the “why.”

In the example of lending, credit data doesn’t capture the full story of a borrower. Credit models assume their repayment behavior will mimic others with similar financial variables. This may be true on aggregate, but the adverse effect of missing context for past delinquencies can disproportionately burden communities with fewer resources. 

recent study from Urban Institute found that Black and Native American communities have the lowest median credit scores in the US. Young adults in majority-Black and majority-Hispanic communities are more likely to begin adulthood with lower average credit scores than their peers in majority-white communities. This is due to the structure of our financial system that often provides less access to financial services to communities of color , creating roadblocks that further limit them from accessing and building credit and wealth.

The future of Machine Learning is Now 

If we do not address and correct for the underlying causes of biases in the data, AI models can perpetuate and multiply these biases in decisions going forward. While these challenges may seem insurmountable, the truth is that advanced modeling and decisioning technology can help address each of these areas of bias, as long as you know what capabilities to look for. 

1. Transparency is non-negotiable

We often hear that it’s a problem when we don’t understand how AI and machine learning systems make decisions (the “black box” issue). To address this, two main approaches have emerged: Interpretability and Explainability.

Interpretable Machine Learning: This means using methods that allow us to see how the AI system is making decisions. It’s like having a clear window into the system, so we can understand what’s going on inside. 

Explainable Machine Learning: This approach involves adding extra models on top of the AI system to try and explain its decision-making process. It’s like creating a guide that helps us understand what’s happening inside the “black box.”

For issues that directly affect people’s lives, like who gets approved for a loan, or who gets the opportunity to interview for a job, we must prioritize transparency using interpretable machine learning approaches. We need to be able to look inside the system, understand it, and make necessary changes to dismantle biases and ensure fairness.

2. Seeing the forest through the trees 

When optimizing a machine learning algorithm, modelers aim to closely match patterns in the data. They use various metrics to measure this match and then adjust the machine’s settings to improve it using training and testing data. It is now possible though to set dual objectives, where the machine tries to meet more than one goal or follow a goal while considering one or more limitations. It’s criticalto make bias a key indicator of a model’s performance. Setting these multiple objectives and constraints at the onset of model development will not only ensure that your organization is on the same page concerning priorities but also empower you to see the forest through the trees.

3. Human in the Loop

Data alone cannot give you the full picture of what happened in the past or what will happen in the future. Select machine learning approaches allow you to learn from data, see what you learned, and then change it. This type of approach can incorporate subject matter expertise that can help address the aforementioned biases. 

What’s Next

Data will always be biased as long as we look at the past to inform the future.  Today, we can uncover, understand, and undo bias with the help of the right technology. This is how we will address these critical issues now and drive a more equitable future.