Navigating an Ever-Changing World with AI (& PRE)

Greek Philosopher Heraclitus’ “panta rhei” (πάντα ῥεῖ; “everything flows”) has been aphorism for pointing out a basic truth for more than two thousand years: We know that the world around us is constantly changing and, thus, we constantly need to adapt. Our civilization appears to be in a state that is far from any elusive equilibrium, and looking at the recent cascade of historical and social events, it seems that change is happening even faster than ever before.

Sure, history may repeat itself, but deep down, we know that history cannot fully predict what the future will bring.

Yet, most of our thinking about tomorrow is strongly rooted in past experiences. In my data science research work with Stratyfy, I encounter this idea all the time: Training our models on past data and then using these models for future decisions is a widely accepted, standard approach.

But how can we go beyond this idea? And given the ever-changing environments around us, how do we know when it is time to change our models?

Statistics provide a set of tools to monitor the change of distributions of the explanatory variables of a model. Let’s take the population stability index (or PSI), for example. Simply put, PSI is used to compare the current distribution of an attribute with the historical distribution to determine how the population (in terms of attribute distribution) has changed over time. If the difference is significant, it’s time to build a new model.

But surprisingly, even this idea can be misleading. Most often, this analysis looks at the distributions individually. However, there can be cases in which the joint distribution of variables – the probability of two variables or events happening together – is entirely different while the marginal distributions remain the same.

Why does this matter? At Stratyfy, we think about this in the context of credit risk. Imagine we have data from 60 potential borrowers in this first table below. On the y-axis, borrower data is ordered according to income in three different classes (from low to high) and 20 people fall into each class. On the x-axis, we have them ordered according to credit score (also from low to high) and it happens that again 20 people fall into each class.

Now, the second table has exactly the same count in the marginal distributions. The joint distribution, however, is quite different.

In such a situation, monitoring the PSI of the individual variables will not capture the change of the joint distribution. With PSI providing no warning signs, the algorithm will continue to work under the illusion that nothing has changed, which might expose a business to unanticipated risks or bias. In the above example, any model is expected to behave quite differently depending on how many people are in the central region of the joint distribution.

Even when we do correctly detect the need for change, often there is very little new data available. So how do we then move beyond the idea of only looking at past data in order to develop a feel for the future?

Perhaps we can take a page from how humans make decisions. Most of the time, our past experiences represent only a few data points, not a great many, so our brains assign weights to these factors based on what we do know. Stratyfy’s proprietary probabilistic rule engine (PRE) mimics this human decision-making algorithm. 

Our technology allows for the combination of human and data insights to make more accurate predictions. After all, domain experts can often foresee change far before it actually happens, or at least, they can imagine many different possible scenarios for the future.

In a recent engagement around COVID-19 with Rockefeller Neuroscience Institute, for example, we obtained results that would have been entirely out of reach if PRE hadn’t allowed us to weave the insights from doctors into the fabric of our algorithms. What’s more, PRE has shown to be highly efficient when the size of the training data is small.

This flexibility of PRE to harness expert knowledge in combination with machine learning not only enables our customers to anticipate and react to changing environments in real time. It can also open up entirely new business opportunities to serve populations for which traditional models based on past data fail, simply because past data is unavailable.

As Heraclitus said, no man ever steps in the same river twice. But that doesn’t mean we can’t learn from the path it has carved and the people who have traveled it in order to predict where it might take us next.