Chris Adamson’s Blog: Avoid Surrogate Keys for Fact Tables

Wednesday, September 21, 2011

Avoid Surrogate Keys for Fact Tables

I am often asked for feedback on designs that include a "surrogate key for the fact table." There are two common reasons for which they are proposed; both have better alternatives.

Surrogate keys are for dimension tables

A surrogate key is an attribute that is created to uniquely identify rows in a dimension table. It does not come from a source system; it is created expressly for the dimensional schema.

Surrogate keys for dimension tables serve two important purposes:

They make it easier to track history. They allow the dimension to capture changes to something, even if the source does not. Absent a surrogate key, this would be difficult; the primary key of the dimension would be concatenation of natural keys and type 2 attributes.
They make it easy to join to the dimension. The dimensions' surrogate keys appear in fact tables as foreign keys. They allow the fact to be joined to the appropriate dimension values, without having to use a multi-part key.

A fact table does not require a manufactured data element for either of these purposes.

Tracking change history of facts? Use a log.

When the value of a fact can change, a surrogate key for the fact table might be proposed. This would theoretically allow the fact table to record the change history of facts, in the same way that a dimension table does.

Unfortunately, storing the history of facts in this manner destroys the usability of the star. The additive properties of the facts are lost, and it becomes much harder to use.

If the facts can change, the fact table should be updated.

To track the history of facts, use an audit table as described in a previous post. This table can log historic values, or can store the changes as "deltas" in a dimensional format.

Joining to other fact tables? Drill across.

The other reason surrogate keys are proposed for fact tables is that it will make them "easier to join."

Joining fact tables to dimensions is easy; the fact table already contains foreign keys that reference the surrogate keys in dimension tables. But what about joining to other fact tables?

Because they contain facts meant to be aggregated, fact tables should never be joined to one another. Otherwise, double-counting will ensue.

Facts from multiple stars should be combined by drilling across, as described in a previous post. Don't try to merge them by joining fact tables.

If you're not after facts with this proposed fact-to-fact join, you must be after dimension values. If this is the case, carry forward any/all useful dimensions to other stars in the value chain.

More info

To learn more about surrogate keys, check out these posts:

For Slowly Changing Dimensions, Change is Relative (10/9/2007)
Do I Really Need Sorrogate Keys? (5/20/2009)
More On Surrogagte Keys (9/8/2009)

For more on tracking the change history of facts, check out this post:

Slowly Changing Facts? (8/29/2011)

To learn about drilling across, read this post:

Multiple Stars and Conformed Dimensions (8/15/2011)

If you find this blog helpful, please consider picking up a copy of my book, Star Schema: The Complete Reference.

Image: Two Keys on a Keyring,

licensed by Creative Commons 2.5

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software