Chris Adamson’s Blog: Q&A: Star vs. Snowflake

Friday, November 5, 2010

Q&A: Star vs. Snowflake

A question from Portugal gives me an excuse to talk about snowflakes, why you might want to avoid them, and why you might want to use them.

Q: In your perspective, when does a star schema start to be a snowflake? If you have a snowflaked dimension, do you consider the model a star-schema? Or if you have for example an outrigger, is the model not a star-schema anymore?

Pedro in Portugal
http://www.pedrocgd.blogspot.com/

I think the question when does a star become a snowflake is really one of semantics, but I'll give an answer.

Readers who just want to know when to use snowflake designs can skip ahead a couple of paragraphs.

Is it a Star or Snowflake?

As I said, this is really a semantic issue, but here is my answer. When dimension tables are linked to other dimension tables, or to anything that is not a fact table, the design is a snowflake.

The presence of an outrigger indicates a snowflake design, and so does a 3NF dimension containing 45 tables (and yes, I have seen that!).

Is a snowflake a star? I don't have a good answer for this. When I discuss design options with colleagues, we think of it as an either/or choice. But when a design is complete, we refer to each fact table and its dimensions as a "star," even if there are outriggers.

One thing I am sure of is that both types of design are dimensional designs. (So is a cube, by the way.)

Thanks to Pedro for the question. Now I'm going to talk a bit more about snowflakes, for those who are interested.

When in doubt, don't snowflake

Best practices dictate that snowflakes should be avoided. The reasons for this are wholly pragmatic.

They increase ETL complexity
The increase query complexity, harming "usability"
They reduce the understandability of the model

We could get into a discussion on this, but instead I'd like to look at some situations where these guidelines might not apply.

When snowflaking is acceptable

There are some cases the guideline against snowflaking may be relaxed.

Judicious use of outriggers may be acceptable in limited cases where there are relationships between dimensions that must be browsable. But consider a factless fact table.

Outriggers are necessary when there are repeating attributes such as a product with multiple features or a patient with multiple diagnoses.

Outriggers are helpful in situations where there is a recursive hierarchy such as departments that contain other departments, or regions that contain other regions.

Snowflaking may be justified when your software products require it. (e.g. your DBMS or reporting tool.)

In the latter case, you are making a strategic decision rather than a design decision, and I recommend involving several points of view in the decision making process--not just designers.

Read more, and help support this blog:

I devote many pages to snowflake designs in Star Schema The Complete Reference.

Chapter 7 discusses snowflakes and hierarchies, and has an in-depth discussion of some of the issues touched on in this post.
Chapter 9 discusses the use of outriggers and bridge tables to capture repeating dimensions or attributes.
Chapter 10 shows how you can use a bridged design to support a recursive relationship to very powerful effect.
Chapter 16 looks at some of the implications of the star vs. snowflake decision on your BI software

If you use the link above (or the links in the sidebar) to order my book, a portion of the proceeds helps to support this blog. You can also find the table of contents and other info right here.

Image credit: Gravityx9 licensed under Creative Commons 2.0

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software