Chris Adamson’s Blog: November 2016

Thursday, November 17, 2016

Probability and Analytics: Reactions to 2016 Election Forecasts

Reactions to the 2016 election forecasts suggest we don’t do a good job communicating probability and risk.

In a September 2016 post, I suggested readers check out the discussions of analytic models at FiveThirtyEight. One of the links led to their forecast model for the 2016 presidential election.¹

In the past week, I have received quite a bit of email suggesting I should take down the post, given that the model “failed.” For example, one emailer wrote:

How can you continue to promote Nate Silver? The election result proved the analytics wrong.

These reactions expose a real issue with analytics: most people do not understand how to interpret probability.

An analytic failure?

On November 7th, the final prediction of the FiveThirtyEight “Polls Only Model” gave Hillary Clinton a 71% chance of winning. As things turned out, she lost.

Those emailing me were not alone in believing the model failed. The day after the election, there were many stories suggesting FiveThirtyEight and the other aggregators were wrong.²

But were they?

Nate Silver discusses the FiveThirtyEight Model

(If the video above does not play, you can access it here.)

Understanding probability

The FiveThirtyEight model gave Clinton a 71% chance of winning the election. That’s about a 7 in 10 chance. To understand how to interpret this probability, try the following thought experiment:

Suppose you are at Dulles airport, and are about to board a plane. While you are waiting, you are notified that there is a 7 in 10 chance your flight will land safely. Would you get on the plane?

I know I wouldn’t.

When the probability of something happening is 70%, the probability of it not happening is 30%. In the case of the airline flight, that’s not an acceptable risk!

Now suppose the flight lands safely. Was the prediction right?

Maybe, but maybe not. The plane landed safely, but were the odds with the passengers? Was there actually a greater danger that was narrowly avoided? Was there no danger at all?

When a single event is assigned a probability, its hard to assess whether the assigned probability was “correct.”

Suppose every flight departing Dulles was given a 7 in 10 chance of landing safely, rather than just one. The next day, we check the results and find that all flights landed safely. Was the prediction correct?

In this case, we are able to say that the model was clearly wrong. About 1,800 flights depart Dulles airport each day. The model predicted that thirty percent, or about 540 flights, would not land safely. It clearly missed the mark, and by a wide margin.

Probabilistic predictions are easier to evaluate when they apply to a large number of events.

Explaining probability

In the days and weeks leading up to the election, the FiveThirtyEight staff spent a good deal of time trying to put the uncertainty of their forecast in context. As the election drew closer, these became daily warnings:

November 6: A post outlined just how close the race was, and how a standard polling miss of 3% could swing the election.
November 7: An update called a Clinton win “probable but far from certain.”
November 8: The final model discussion outlined all the reasons a Clinton win was not a certainty, and explored scenarios that would lead to a loss.

Despite all this, many people were unable to interpret the probabilistic model, and the associated uncertainty.

Avoiding unrealistic expectations

If a research scientist at Yale and the MIT Technology Review misunderstood a probabilistic forecast, how well are people in your business doing?

Are people in your business making decisions based on probabilistic models?
Are they factoring an appropriate risk level into their actions?
Are you doing enough to help them understand the strength of model predictions?

It's important that decision makers comprehend the predictive strength of the models they use. And it’s everyone’s responsibility to make sure they understand.

We have a long, long way to go.

Notes:

1. See the post Read (or Listen to) Discussions of Analytic Models. The model discussion I linked to is: A User’s Guide To FiveThirtyEight’s 2016 General Election Forecast

2. “Aggregators" is a term used by the mainstream press to describe data scientists who build models based on polling data. Here are a few stories that suggested these models were wrong: The Wrap, Vanity Fair, The New Yorker, Quanta Magazine, MIT Technology Review.

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software