Chris Adamson’s Blog: Q&A: Degenerate Dimensions

Tuesday, April 17, 2012

Q&A: Degenerate Dimensions

A reader asks about degenerate dimensions:

Q: I understand that when a particular dimension value is unique to each fact row (i.e., an invoice number perhaps) it should probably exist as a degenerate dimension in the fact table. Otherwise, that dimension table will contain as many rows as the fact table.

However, what about a situation in which a dimension table contains only one column aside from the surrogate key (i.e., a status of some sort) but is not unique to each fact row. In that case, should it be moved into the fact table as a degenerate dimension?

- Brian

Both cases are good applications for a degenerate dimension: a transaction identifier or a low-cardinality attribute that would otherwise need to be a very boring single-column table.

In all cases, the degenerate dimension is entirely optional.

Degenerate dimensions

A degenerate dimension is nothing more than a dimension attribute stored in a fact table, rather than a dimension table.

In all other respects, the degenerate dimension behaves exactly like any other dimension attribute. It can be used to provide context for facts, qualify queries, sort results, drive master-detail relationships, and so forth.

The usual reason to create a degenerate dimension is simple: A dimension with one attribute really doesn't need to be in a separate table with a surrogate key.

Two important things to note here:

Degenerate dimensions are optional
There is nothing wrong with a 1:1 relationship between dimension table and fact table

Lets take a closer look at two common uses of degenerate dimensions.

Transaction identifiers as degenerate dimensions

For a transaction fact table, we usually set the grain at the lowest level of detail possible. Often that is an individual transaction such as an order line, invoice line or shipping line. (For a refresher on grain, see this post from 2009.)

Typically, these things have some kind of unique identifier which is carried into the dimensional model. Examples include an order line number, invoice line ID, transaction ID, etc. When these identifiers do not exist, we often create them, as I have discussed in a previous post.

Many designers opt to place these transaction identifiers directly into the fact table as a degenerate dimension. However, this is entirely optional. There is nothing wrong with a dimension table being in a 1:1 relationship with the fact table, as I have discussed previously.

Some designers forgo the degenerate dimension for transaction identifiers because that there will be other schemas where the attribute in question is also referenced. In many cases, our business intelligence software may function better if the attribute in question is located in exactly one place -- a shared dimension table.

As an aside, note that in these other schemas, the cardinality may be different. For example, the same invoice line may be referenced in multiple shipments or invoices.

Other degenerate dimensions

Not all degenerate dimensions are transaction identifiers. They may simply be dimension attributes with no other suitable home. The status code mentioned by the reader may be an example. Another example appeared in a previous Q&A post: Reason Overpaid.

Once again, the use of the degenerate technique is entirely optional. Many designers are comfortable placing the attribute in the fact tables, while others opt for dimension tables with one attribute.

Notice that a dimension table with one attribute does not require a surrogate key for slow change purposes. However, without the surrogate key, the attribute value would be stored in the fact table anyway, as a foreign key reference to a single column table!

Placing the attribute in a separate table may help with our BI tools, for the same reason given above. Database administrators may also prefer the option because it makes it easier to configure a star-join optimizer. Adding a surrogate key may reduce the size of the foreign key that will be stored in fact table rows.

Learn More

For more information on this topic, see the following posts:

Degenerate Dimensions, ETL and BI (October 15, 2010) Discusses the use of a low-cardinality attribute as a degenerate dimension, and also looks at junk dimensions.

Creating Transaction Identifiers for Fact Tables (October 17, 2011) Addresses creation of transaction identifiers when source systems do not provide them.

Accumulating Snapshots (October 1, 2010) Discusses another situation where dimension table may have the same number of rows as the fact table, or perhaps even less.

Rule 1: State Your Grain (December 9, 2009) Defines the concept of grain, and discusses the best practice of setting it at the lowest level of detail possible.

Of course, all these topics are also covered in Star Schema: The Complete Reference.

You can help support this blog by using the links here to order a copy from Amazon.com.

Image licensed via Creative Commons 2.0
from Patrick Hoesley

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software