Chris Adamson’s Blog: Q&A: Degenerate Dimensions, ETL and BI

Friday, October 15, 2010

Q&A: Degenerate Dimensions, ETL and BI

A question from a reader about including dimensions in the fact table:

Q: Question concerning an argument I am having with a colleague. We have a transaction fact table that will have an attribute called "Reason Overpaid". This attribute can only contain one of 10 values. Is it better to create an "Reason Overpaid" dimension and put a FK in the fact table referencing to the dimension, or just have the "Reason Overpaid" description in the fact table.

A: This is one argument I will not be able to settle. Either approach is fine.

Stored in the fact table, this attribute would be known as a degenerate dimension. It is perfectly acceptable there, but you may decide to move it to a separate table for other reasons.

Degenerate Dimension

A degenerate dimension is nothing more than a dimension attribute stored in the fact table. This technique is commonly employed when there is an attribute left over that doesn't really fit into any of the other dimension tables.

Your "Reason Overpaid" attribute can be stored in the fact table as a degenerate dimension. You can still use it in the exact same way as any other dimension attribute -- as a way to filter queries, group results, break subtotals, and so forth.

Keeping it in the fact table avoids unnecessary complexity -- a new table and key attribute to manage and load, a new foreign key lookup when processing facts, and most importantly an extra join to include in queries.

That said, a dimension table for the attribute may make sense in some situations.

Junk Dimension

If there are more than one degenerate dimensions, consider moving them all to a separate dimension table. This is called a junk dimension. The attributes are not directly related to one another and there is no natural key. It is populated with the table that contains the Cartesian product of all possible values.

ETL Consistency Concerns

If your "Reason Overpaid" will also appear in other fact tables, worries about ETL consistency may arise.

Degenerate dimensions are still OK in this situation, but now two or more fact tables will contain the attribute, and it will be necessary to be sure it is loaded consistently. Creating a separate dimension table allows the values to be created exactly once, avoiding any problems that might be created by inconsistent ETL processing.

While I would not go to a separate table for this reason, I do understand why many designers opt to do so. The next situation is a different story.

BI Tool Capabilities

If your "Reason Overpaid" will also appear in other fact tables, the capabilities of your BI software may come into play.

The scenario is this: you are configuring your BI tool to auto-generate SQL queries for users. You'd like to have an item they can request called "Reason Overpaid", but the tool does not understand that it can appear in two places in the database schema.

Creating a dimension table for the attribute solves this problem. Both fact tables can link to the same dimension table. The tool can now have a definitive place to go for "Reason Overpaid", and may even be able to use it as the basis for comparing data in two fact tables. This is a strong reason to go with a separate table.

Luckily, many BI tools can be configured to acknowledge that a dimension may appear in more than one place, in which case this is not an issue. And if you are building cubes for the purposes of BI reporting, you can trust your developers to choose the right attribute.

If you're interested in reading more about how BI tools may influence your dimensional design, be sure to check Chapter 16, "Design and Business Intelligence" in my latest book, Star Schema The Complete Reference.

Send in your questions

Have a question of your own about dimensional modeling? Send it to the blog address in the sidebar.

Chris

Image credit: Mykl Roventine via Creative Commons 2.0

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software