What happens to loan applicants that our credit models reject? Is there a chance those rejected applicants might be creditworthy?
Take, for instance, people who recently immigrated to the United States. Because credit scores place a high value on length of credit history in the US, often this population’s documented credit history is so limited that they don’t even have credit scores at all. Instead, they join the estimated 28 million people in the US who are credit invisible.
Why does this matter? Well, for one, this puts them in an unfavorable position from the perspective of the predictive models commonly used to make loan decisions. Even if these applicants have well paying jobs, strong financial backgrounds, and are reliable when repaying their debts, the models rely heavily on credit scores as a predictive variable. That’s a massive problem, especially when in addition to the 28 million credit invisibles in the US, 21 million more people are unscorable by conventional credit scores.
Looking more closely at the data science behind the problem, one of the main challenges in making accurate predictions about this population is that the model being used to make the loan decision has never seen credit invisible or thin file applicants previously. Because the data used to build and train the credit decisioning model typically includes only accepted applicants, lenders can fall into a sampling bias trap that leads to inaccurate credit risk decisions.
If we want to advance to more accurate predictions – and more inclusive finance – we cannot directly infer the repayment probabilities of rejected applicants using the model developed on accepted applicants only.
About 18% of the U.S. population is either underbanked or unbanked, according to the 2020 Survey of Household Economics and Decisionmaking (SHED), conducted by the Federal Reserve Board. This means that predictive models never see applicants from a considerable part of the population, which creates bias in our decisioning. And as we know from our past blog posts, model bias can have negative unintended effects.
So, how can we eliminate sampling bias if we only look at the pool of approved applicants when developing our credit risk models? Reject inference can help.
Reject Inference: What it Is
Reject inference methodologies enable us to use a pool of rejected applicants to extract more information and insights about the population of applicants as a whole. In turn, the resulting insights might help us build better performing credit risk models that are more representative of all applicants.
This is particularly powerful in enriching our data when we do not have enough labeled samples to train our models. Not only does it enable us to increase the data size quantitatively, but it also helps us delineate additional insights and use them to make our predictive models more accurate.
Stratyfy’s Approach to Reject Inference
At Stratyfy, we have developed a range of algorithms that harness the power of reject inference to explore the unknown parts of credit data. One particularly efficient approach is implemented using Stratyfy’s proprietary Probabilistic Rule Engine, or PRE. PRE is unique because it provides a transparent and highly compact description of the data via probabilistic rules. As such, PRE models usually have only a few internal degrees of freedom and therefore tend to generalize well to previously unseen parts of the data.
In addition, PRE models are less sensitive to noise than many other algorithms, and they have a relatively low sample complexity. These advantages, combined with PRE’s transparent nature, are highly advantageous when dealing with reject inference and allow for bias mitigation without compromising the models’ predictive power.
Reject Inference, Illustrated
To demonstrate the capabilities of PRE in reject inference, let’s compare it to another popular machine learning technique, XGBoost.

In this example, our objective was to increase the predictive power (measured in terms of AUC) of our models using reject inference. We applied reject inference by iteratively augmenting the data set, and as expected, both PRE and XGboost demonstrated an increase in performance after incorporating reject inference.
In fact, we were able to increase the performance of the original PRE model by about 20% and XGboost by 10%.
While both models improved significantly after incorporating reject inference, what’s most important is that PRE was able to achieve the same results as XGboost but with much less complexity and more transparency. As regulations continue to evolve, this level of interpretability sets PRE apart. Stay tuned for more on the data science behind it coming soon (sign up here to get notified).
Looking Ahead: Why Reject Inference Matters
Ultimately, incorporating reject inference into our original credit risk models drastically improves model performance on the accepted populations. With reject inference, we not only provide a more accurate model, but we reduce sample bias, which means we help our customers assess applicants’ creditworthiness more fairly. Additionally, based on the nature of our reject inference approach, our resulting model leads to higher risk-adjusted returns, since it gradually incorporates rejected applicants with the lowest default probabilities at each step.
Eliminating sample bias is crucially important. By reducing sample selection bias we can increase the accuracy and the discriminatory power of our model. In turn, our model is more inclusive, and we learn more about all of our potential customers rather than only accepted applicants. This is hugely significant for making lending more equitable, especially for underbanked and unbanked populations, and this work is core to our mission at Stratyfy, where we believe that transparent AI and machine learning can advance a more inclusive financial future for all.