October 25, 2012

Nate Silver and Data-Recency Bias

Patrick Marren
Partner

One more thought provoked by Nate Silver's thought-provoking The Signal and the Noise.

It's about the seductiveness of data and cool, overly precise models.

(In comparison, say, to scenario-based planning? Sorry, the scenario consultants in us always rear their ugly, pragmatic, self-promoting heads. We gotta work on that.)

Silver brought back to the surface an idea I (and others, such as Nasim Nicholas Taleb) have had before, but especially since the financial crash of 2008: accelerating advances in technology and data-gathering create a new sort of bias that will be with us from now until the next Dark Ages.

Many of the flawed models of reality that helped cause the crash were dangerous because they relied upon only very recent data.

Why did they ignore less recent, but possibly much more relevant, data? Because the quality of that data - its depth, its richness, its formatting - was "inferior." (It's related to the old "Drunk Under Lamppost Looking for Keys" bias. "Did you drop your keys near the lamppost?" "No, but the light's better here.")

Every day someone somewhere begins to collect data that have never been collected before. In a sense, every time this happens, history begins anew and everything that has gone before never happened.

So when financial firms began to construct models of the mortgage market in the 2000s, they began by mining for data. What did they find? They found thunderous rich veins of data for the immediately preceding period, with sketchier and sketchier data the farther back in time they went.

So what did they do? They fell in love with the "good" data and dispensed with the "bad" data. Their models reportedly in some cases were entirely based on the previous five years of "good" data, with nothing at all from previous decades.

What was the result? Well, to paraphrase Taleb's book The Black Swan, it was as though they had taken a month's weather data in May and decided from that that it never snows in Chicago and there are never any hurricanes in the Gulf of Mexico.

The huge databases the financial firms were using did not contain any data from times of actual economic turbulence: they did not include the late 1980s' market crashes, nor the late 1970s' instability and stagflation, much less the Great Depression.

Why not? Because the data from back then did not contain all the cool parameters that the modelers wanted to use. 

Shockingly, in 2008 events occurred that according to one of Goldman Sachs' models of the U.S. mortgage market were twenty-five standard deviations from the norm. That is equivalent to odds of one to more than the number of all the atoms in the universe.

(When something like this occurs, let's just say it is very much more likely that the model is severely wrong than that one-to-all-the-atoms-in-the-known-universe events are taking place.) 

In other words, to the shock and dismay of Goldman Sachs' mayfly programmers, weaned in temperate balmy climes, in 2008 it snowed in Chicago, and Hurricane Ike whaled on Houston.

The bottom line is this: Data collection is improving and will continue to improve, as I have said, until the next Dark Age.

The number of cool parameters the data collectors invent and start collecting will increase like a hockey-stick graph.

So the "cool" data will ALWAYS be the most recent data. And the nerds will ALWAYS have a temptation to eschew old, "uncool" data from previous eras, because they make for less beautiful, clunkier models. 

Which means that this new form of systematic bias can only get worse as time goes by and our ability to record the vicissitudes of human economic existence gets exponentially better. 

What's the solution?

Surprise...we think one solution is scenario planning. (Bet you did not see that one coming...slightly less than 25 standard deviations' worth of unpredictability in a consultant thinking his consulting approach is the right answer.)

Using your right brain to get away from the purely data-driven, quantitative, historical, extrapolative approach, and imagining as wide a range of plausible, macro-level, qualitative outcomes as you can; and then using your left brain to work your way back to how they MIGHT actually happen. 

Or you can just keep hoping that summer is never going to end. (Hey, it really is going to be 78 degrees in Chicago today - maybe it never WILL snow again.)

As Shakespeare might have put it:

"Shall I compare thee to a summer's day?/Not unless I want to lose mine shirt."

 

Thoughts?