Friday, November 5, 2010

Q&A: Star vs. Snowflake

A question from Portugal gives me an excuse to talk about snowflakes, why you might want to avoid them, and why you might want to use them.

Q: In your perspective, when does a star schema start to be a snowflake? If you have a snowflaked dimension, do you consider the model a star-schema? Or if you have for example an outrigger, is the model not a star-schema anymore?

I think the question when does a star become a snowflake is really one of semantics, but I'll give an answer.

Readers who just want to know when to use snowflake designs can skip ahead a couple of paragraphs.

Is it a Star or Snowflake?

As I said, this is really a semantic issue, but here is my answer. When dimension tables are linked to other dimension tables, or to anything that is not a fact table, the design is a snowflake.

The presence of an outrigger indicates a snowflake design, and so does a 3NF dimension containing 45 tables (and yes, I have seen that!).

Is a snowflake a star? I don't have a good answer for this.  When I discuss design options with colleagues, we think of it as an either/or choice. But when a design is complete, we refer to each fact table and its dimensions as a "star," even if there are outriggers.

One thing I am sure of is that both types of design are dimensional designs.  (So is a cube, by the way.)

Thanks to Pedro for the question.  Now I'm going to talk a bit more about snowflakes, for those who are interested.

When in doubt, don't snowflake

Best practices dictate that snowflakes should be avoided.  The reasons for this are wholly pragmatic.
  • They increase ETL complexity
  • The increase query complexity, harming "usability"
  • They reduce the understandability of the model
We could get into a discussion on this, but instead I'd like to look at some situations where these guidelines might not apply.

When snowflaking is acceptable

There are some cases the guideline against snowflaking may be relaxed.
  • Judicious use of outriggers may be acceptable in limited cases where there are relationships between dimensions that must be browsable.  But consider a factless fact table.
  • Outriggers are necessary when there are repeating attributes such as a product with multiple features or a patient with multiple diagnoses.
  • Outriggers are helpful in situations where there is a recursive hierarchy such as departments that contain other departments, or regions that contain other regions.
  • Snowflaking may be justified when your software products require it.  (e.g. your DBMS or reporting tool.)
In the latter case, you are making a strategic decision rather than a design decision, and I recommend involving several points of view in the decision making process--not just designers.

Read more, and help support this blog:

I devote many pages to snowflake designs in Star Schema The Complete Reference.
  • Chapter 7 discusses snowflakes and hierarchies, and has an in-depth discussion of some of the issues touched on in this post.  
  • Chapter 9 discusses the use of outriggers and  bridge tables to capture repeating dimensions or attributes. 
  • Chapter 10 shows how you can use a bridged design to support a recursive relationship to very powerful effect.
  • Chapter 16 looks at some of the implications of the star vs. snowflake decision on your BI software 
If you use the link above (or the links in the sidebar) to order my book, a portion of the proceeds helps to support this blog.   You can also find the table of contents and other info right here.

Image credit: Gravityx9 licensed under Creative Commons 2.0