Wednesday, September 28, 2016

Read (or Listen to) Discussions of Analytic Models

Organizations often feel their analytics are proprietary, and therefore decline to discuss how their models work. One shining exception is Nate Silver’s The site makes a point of exposing how their models are built. They also discuss their models as part of their elections podcast.

Data Storytelling

Recommended Reading
As students in my courses know, is a data driven journalism blog founded by Nate Silver. FiveThirtyEight covers sports, politics, science, and popular culture.

If you are interested in visualization, analytics, or telling stories with data, you will enjoy the site.

Stories on FiveThirtyEight are always shaped by data. And if they develop a model of any kind, that model is openly explained. You may have to cull through footnotes, but its always there.

One of the most detailed discussions on the site right now describes their 2016 election forecast model. (With apologies to readers outside the US, this is a very US-centric topic.)


FiveThirtyEight also offers several podcasts, where you can listen to analyst discussions which are driven by data.

Until recently, these conversations rarely delved into the technical realm. On the elections podcast, if Nate Silver or Harry Enton mentioned “long tails,” “blended averages,” or “p-values,” the other hosts jokingly steered the conversation back to analysis.

That practice was put to an end a few weeks ago with the establishment of “Model Talk” episodes. Every second Friday the model itself is discussed in greater detail. For example, in the 8/26 episode, Silver describes the predictive value of state polls over national polls, and why it is important to build a model where state by state probabilities interact.

Here are links to the “model talk" discussions to date:

Recommended Reading

I also highly recommend Silver’s book, The Signal and the Noise: Why So Many Predictions Fail—but Some Don’t. If you are interested in analytics, it is a fascinating read.

Sunday, April 24, 2016

Chris Adamson on Modeling Challenges

In a recent interview, the folks at WhereScape asked me some questions about data modeling challenges.

In Business Intelligence, modeling is a social activity. You cannot design a good model alone. You have to go out and talk to people.

As a modeler, your job is to facilitate consensus among all interested parties. Your models need to reflect business needs first and foremost. They must also balance a variety of other concerns — including program objectives, the behavior of your reporting and visualization tools, your data integration tools, and your DBMS.

It’s also important to understand what information resources are available. You need to verify that it is possible to fill the model with actual enterprise data. This means you need to profile and understand potential data sources. If you don’t consider sources of data, your designs are nothing more than wishful thinking.

When considering a non-relational data sources, resist the urge to impose structure before you explore it. You’ve got to understand the data before you spend time building a model around it.

Check out the video above, where I discuss these and other topics. For a full-sized version, visit the WhereScape page.

Wednesday, December 23, 2015

What Hollywood Can Teach Analytics Professionals: How to Tell Stories

You might not realize it, but you probably have something in common with the creators of the TV show South Park. 

Analytics yield insights that can have powerful business impact. These insights come from statistics and data mining—processes that are inaccessible to most people. If you want your business to learn and remember, you have to tell a story.

All too often, the communication of an analytic finding reads like a police report: procedural, laden with jargon, and stripped of meaningful business context.

That’s not interesting. People won’t learn from it, and they certainly won’t change their behavior.

How then to get your point across? You need to learn how to tell stories. Data stories.

Trey Parker and Matt Stone know a thing or two about telling a story. They are the creators of South Park, a wildly successful television show which has been on the air for 19 years. Like you, their success depends on telling interesting stories.

In the video clip above, Parker and Stone are speaking to a group of students at NYU on storytelling strategies. Trey tells the students:

We can take these beats, which are basically the beats of your outline, and if the words “and then” belong between those beats, you’re f***ed. Basically. You’ve got something pretty boring.

What should happen between every beat that you’ve written down is either the word “therefore” or “but.”

Data storytellers make this mistake all the time. "We did this…then we tried that…the algorithm showed this…the correlation coefficient is that…our conclusion is...”

This kind of forensic storytelling is boring. It won’t be remembered, and the value of the insight will be lost. Save the procedural detail for an appendix somewhere. People learn from good stories, not lab reports.

As Matt says later in the clip, you need causality to have an interesting story:

But. Because. Therefore.  That gives you the causation between each beat.  And that…that’s a story.

Be sure to watch the entire clip and, if you are so inclined, take some time off for an episode or two of South Park. It just might make you a better data scientist!

The embedded video is from the NY Times ArtsBeat blog post, Hello! Matt Stone and Tray Parker Crash a Class at NYU (September 8, 2011).  Hat tip to Tony Zhou and his Video Essay on F for Fake at the marvelous blog Every Frame a Paining.