The BIN Model of Forecasting Errors and Its Implications for Improving Predictive Accuracy

At Britten Coyne Partners, Index Investor, and the Strategic Risk Institute, we all share a common mission: To help clients avoid failure by better anticipating, more accurately assessing, and adapting in time to emerging strategic threats.

Improving clients’ (and our own) forecasting process to increase predictive accuracy is critical to our mission and our research in this area is ongoing. In this note, we’ll review a very interesting new paper that has the potential to significantly improve forecast accuracy.

In “Bias, Information, Noise: The BIN Model of Forecasting”, Satopaa, Salikhov, Tetlock, and Mellers introduce a new approach to rigorously assessing the impact of three root causes of forecasting errors. In so doing, they create the opportunity for individuals and organizations to take more carefully targeted actions to improve the predictive accuracy of their forecasts.

In the paper, Satopaa et al decompose forecast errors into three parts, based on the impact of bias, partial information, and noise. They assume that, “forecasters sample and interpret signals with varying skill and thoroughness. They may sample relevant signals (increasing partial information) or irrelevant signals (creating noise). Furthermore, they may center the signals incorrectly (creating bias).”

Let’s start by taking a closer look at each root cause.

WHAT IS BIAS?

A bias is a systematic error that reduces forecast accuracy in a predictable way. Researchers have extensively studied biases and many of them have become well known. Here are some examples:

Over-Optimism: Tali Sharot’s research has shown how humans have a natural bias towards optimism. We are much more prone to updating our beliefs when a new piece of information is positive (i.e., better than expected in light of our goals) rather than negative (“How Unrealistic Optimism is Maintained in the Face of Reality”).

Confirmation: We tend to seek, pay more attention to, and place more weight on information that supports our current beliefs than information that is inconsistent with or contradicts them (this is also known as “my-side” bias).

Social Ranking: Another bias that is deeply rooted in our evolutionary past is the predictable impact of competition for status within a group. Researchers have found that when the result of a decision will be private (not observed by others), we tend to be risk averse. But when the result will be observed, we tend to be risk seeking (e.g., “Interdependent Utilities: How Social Ranking Affects Choice Behavior” by Bault et al).

Social Conformity: Another evolutionary instinct comes into play when uncertainty is high. Under this condition, we are much more likely to rely on social learning and copying the behavior of other group members, and to put less emphasis on any private information we have that is inconsistent with or contradicts the group’s dominant view. The evolutionary basis for this heightened conformity is clear – you don’t want to be cast out of your group when uncertainty is high.

Overconfidence/Uncertainty Neglect: The over-optimism, confirmation, social ranking, and social conformity biases all contribute to forecasters’ systematic neglect of when we make and communicate forecasts. We described this bias (and how to overcome it) much more detail in our February blog post, “How to Effectively Communicate Forecast Probability and Analytic Confidence”.

Surprise Neglect: While less well known than many other biases, this one is, in our experience, one of the most important. Surprise is a critically important feeling that is triggered when our conscious or unconscious attention is attracted to something that violates our expectations of how the world should behave, given our mental model of the phenomenon in question (e.g., stock market valuations increasing when macroeconomic conditions appear to be getting worse). From an evolutionary perspective, surprise helps humans to survive by forcing them to revise their mental models in order to more accurately perceive the world – especially its unexpected dangers and opportunities. Unfortunately, the feeling of surprise is often fleeting.

As Daniel Kahneman noted in his book, “Thinking Fast and Slow”, when confronted with surprise, our automatic, subconscious reasoning system (“System 1”) will quickly attempt to adjust our beliefs to eliminate the feeling. It is only when the adjustment is too big that our conscious reasoning system (“System 2”) is triggered to examine its cause us to feel surprised – but even then, System 1 will still keep trying to make the feeling of surprise disappear. Surprise neglect is one of the most underappreciated reasons that inaccurate mental models tend to persist.

WHAT IS PARTIAL INFORMATION?

In the context of the BIN Model, “information” refers to the extent to which we have complete (and accurate) information about the process generating the future results we seek to forecast.

For example, when a fair coin is flipped four times, we have complete information that enables us to predict the probability of different outcomes with complete confidence. This is the realm of decision in the face of risk.

When the complexity of the results generating process increases, we move from the realm of risk into the realm of uncertainty, in which we often do not fully understand the full range of possible outcomes, their probabilities, or their consequences.

Under these circumstances, forecasters have varying degrees of information about the process generating future results, and/or models of varying degrees of accuracy for interpreting the meaning of the information they have. Both contribute to forecast inaccuracy.

WHAT IS NOISE?

“Noise” is unsystematic, unpredictable, random errors that contribute to forecast inaccuracy. Kahneman defines it as “the chance variability of judgments.”

Sources of noise that are external to forecasters include randomness in the results generating process itself (and, as in the case of complex adaptive systems, the deliberate actions of the intelligent agents who comprise that process). Internal sources of noise include forecasters’ use of low or no value information about and/or a model of a results generating process that are either inaccurate or irrelevant, or that vary over time (often unconsciously).

After applying their analytical “BIN” framework to the results of the Good Judgment Project (a four-year geopolitical forecasting tournament described in the book “Superforecasting” in which one of us participated), Satopaa and his co-authors conclude that, “forecasters fall short of perfect forecasting [accuracy] due more to noise than bias or lack of information. Eliminating noise would reduce forecast errors … by roughly 50%; eliminating bias would yield a roughly 25% cut; [and] increasing information would account for the remaining 25%. In sum, from a variety of analytical angles, reducing noise is roughly twice as effective as reducing bias or increasing information.”

Moreover, they authors found that a variety of interventions used by the Good Judgment Project that were intended to reduce forecaster bias (such as training and teaming) actually had their biggest impact on reducing noise. They note that, “reducing bias may be harder than reducing noise due to the tenacious nature of certain cognitive biases” (a point also made by Kahneman in his Harvard Business Review article, “Noise: How to Overcome the High, Hidden Cost of Inconsistent Decision Making”).

The BIN model highlights three levers for improving forecast accuracy: Reducing Bias, Improving Information, and Reducing Noise. Let’s look at some effective techniques in each of these areas.

HOW TO REDUCE BIAS

The first point to make about bias reduction is that a considerable body of research has concluded that this is very difficult to do, for the simple reason that deep in our evolutionary past, what we negatively refer to as “biases” served a positive evolutionary purpose (e.g., overconfidence helped to attract mates).

That said, both our experience with Britten Coyne Partners’ clients and academic research has found that two techniques are often effective.

Reference/Base Rates and Shrinkage: Too often we behave as if the only information that matters is what we know about the question or results we are trying to forecast. We fail to take into account how things have turned out in similar cases in the past (this is know as the reference or base rate). So-called “shrinkage” methods start by identifying a relevant base rate for the forecast, then move on to developing a forecast based on the specific situation under consideration. The more similar the specific situation is to the ones used to calculate the base rate, the more the specific probability is “shrunk” towards the base rate probability.

Pre-Mortem Analysis: Popularized by Gary Klein, in a pre-mortem a team is told to assume that it is some point in the future, and a forecast (or plan) has failed. They are told to anonymously write down the causes of the failure, including critical signals that were missed, and what could have been done differently to increase the probability of success. Pre-mortems reduce over-optimism and overconfidence, and produce two critical outputs: improvement in forecasts and plans, and the identification of critical uncertainties about which more information needs to be collected as the future unfolds (which improves signal quality). The power of pre-mortems is due to the fact that humans have a much easier time (and are much more detailed) in explaining the past than they are when asked to forecast the future – hence the importance of situating a group in the future, and asking them to explain a past that has yet to occur.

HOW TO INCREASE RELEVANT INFORMATION (SIGNAL)?

Hyperconnectivity has unleashed upon us a daily flood of information with which many people are unable to cope when they have to make critical decisions (and forecasts) in the face of uncertainty. Two approaches can help.

Information Value Analysis: Bayes Theory provides a method for separating high value signals from the flood of noise that accompanies them. Let’s say that you have determined that three outcomes (A, B, and C) are possible. The value of a new piece of information (or related pieces of evidence) can be determined based on the likelihood you would observe it if Outcomes A, B, or C happen. If the information is much more likely to be observed in the case of just one outcome, it has high value. If it is equally likely under all three outcomes, it has no value. More difficult, but just as informative, is applying this same logic to the absence of a piece of information. This analysis (and the outcome probability estimates) should be repeated at regular intervals to assess newly arriving evidence.

Assumptions Analysis: Probabilistic forecasts rest on a combination of (1) facts, (2) assumptions about critical uncertainties, (3) the evidence (of varying reliability and information value) supporting those assumptions, and (4) the logic used to reach the forecaster’s conclusion. In our forecasting work with clients over the years, we have found that discussing the assumptions made about critical uncertainties, and, less frequently the forecast logic itself, generates very productive discussions and improves predictive accuracy.

In particular, Marvin Cohen’s approach has proved quite practical. His research found that the greater the number of assumptions about “known unknowns” (i.e., recognized uncertainties) that underlie a forecast, and the weaker the evidence that supports them, the lower confidence one should have in the forecast’s accuracy.

Also, the more assumptions about “known unknowns” that are used in a forecast, the more likely it is that potentially critical “unknown unknowns” remain to be discovered, which again should lower your confidence in the forecast (e.g., see, “Metarecognition in Time-Stressed Decision Making: Recognizing, Critiquing, and Correcting” by Cohen, Freeman, and Wolf).

HOW TO REDUCE NOISE?

Combine and Extremize Forecasts: Research has found that three steps can improve forecast accuracy. The first is seeking forecasts based on different forecasting methodologies, or prepared by forecasters with significantly different backgrounds (as a proxy for different mental models and information). The second is combining those forecasts (using a simple average if few are included, or the median if many are). The final step, which significantly improved the performance of the Good Judgment Project team in the IARPA forecasting tournament, is to “extremize” the average (mean) or median forecast by moving it closer to 0% or 100%.

Averaging forecasts assumes that the differences between them are all due to noise. However, as the number of forecasts being combined increases, use of the median produces the greatest increase in accuracy because it does not “average away” all information differences between forecasters. However, that still leaves whatever bias is present in the median forecast.

Forecasts for binary events (e.g., the probability an event will or will not happen within a given time frame) are most useful to decision makers when they are closer to 0% or 100% rather than the uninformative “coin toss” estimate of a 50% probability. As described by Baron et al in “Two Reasons to Make Aggregated Probability Forecasts More Extreme”, individual forecasters will often shrink their probability estimates towards 50% to take into account their subjective belief about the extent of potentially useful information that they are missing.

For this reason, forecast accuracy can usually be increased when you employ a structured “extremizing” technique to move the mean or median probability estimate closer to 0% or 100%. Note that the extremizing factor should be lower when average forecaster expertise is higher. This is based on the assumption that a group of expert forecasters will incorporate more of the full amount of potentially useful information than will novice forecasters. (See: “Two Reasons to Make Aggregated Probability Forecasts More Extreme”, by Baron et al, and “Decomposing the Effects of Crowd-Wisdom Aggregators”, by Satopaa et al).

Use a Forecasting Algorithm: Use of an algorithm (whose structure can be inferred from top human forecasters’ performance) ensures that a forecast’s information inputs and their weighting are consistent over time. In some cases, this approach can be automated; in others, it involves having a group of forecasters (e.g., interviewers of new hire candidates) ask the same set of questions and use the same rating scale, which facilitates the consistent combination of their inputs. Kahneman has also found that testing these algorithmic conclusions against forecasters’ intuition, and then examining the underlying reasons for any disagreements between the results of the two methods can sometimes improve results.

However, it is also critical to note that this algorithmic approach implicitly assumes that the underlying process generating the results being forecast is stable over time. In the case of complex adaptive systems (which are constantly evolving), this is not true.

Unfortunately, many of the critical forecasting challenges we face involve results produced by complex adaptive systems. For the foreseeable future, expert human forecasters will still be needed to meet them. By decomposing the root causes of forecasting error, the BIN model will help them to do that.

Comments