• Conrad Chuang

Big Data + Bad reference data = Bad Forecasts

Updated: Jun 13, 2018

A couple of days ago I caught an article in the Wall Street Journal entitled: "Corporate Economists Are Hot Again." The article described how large companies are expanding their internal economic analysis units to

"… digest huge amounts of data … to determine opportunities and risks for companies' business units, not just in the U.S. but around the world."

At first glance I thought this was going to be another article on the importance of data scientists, but then this caught my eye.

Among the biggest challenges?

The economists need to

"… ensure that disparate units within companies are using the same data sets and information inputs in their forecasting."

What happens when models use different data sets and information inputs?

Bob Tita of the Wall Street Journal turned to Richard DeKaser, the Chief Economist at Wells Fargo, for an example:

"Previously, one unit might base unemployment figures on payroll data, while another would use household surveys. Doing so undermined the accuracy of tests to measure risks for losses and contributed to mistakes in business planning."

What caught my eye was that this is one of the few articles I've seen that points out, that at least from a forecasting and business planning perspective, managing the big volumes (or variety, or velocity) isn't the real issue.

The issue for these corporate economists (and all the other ‘data scientists) is inconsistency. Their organization often lack a consistent set of definitions, enterprise dimensions and attributes to feed their forecasting models. And as Frank Schott (another economist) points out — it's not getting better.

"The data are just as recalcitrant as ever … and the multiplicity of it invites further confusion."

Many of the challenges that are described in this article are strongly in line with what we've been telling our clients for quite a while. The value of big data lies in the the insights an organization receives from the models developed for business planning/forecasting, sensitivity analysis, market segmentation and the like. But in order for any of the analysis to be worthwhile and trustworthy, you need a systematic approach (applying software such as EBX5, for example) to ensure that the underlying reference data that characterizes those data sets are consistent, not just between domains (both units are using the same unemployment measures), but also, across time and versions of the reference data itself.