Wednesday, September 21, 2011

Avoid Surrogate Keys for Fact Tables

I am often asked for feedback on designs that include a "surrogate key for the fact table."  There are two common reasons for which they are proposed; both have better alternatives. 

Surrogate keys are for dimension tables

A surrogate key is an attribute that is created to uniquely identify rows in a dimension table. It does not come from a source system; it is created expressly for the dimensional schema.

Surrogate keys for dimension tables serve two important purposes:
  1. They make it easier to track history.  They allow the dimension to capture changes to something, even if the source does not.  Absent a surrogate key, this would be difficult; the primary key of the dimension would be concatenation of natural keys and type 2 attributes.
  2. They make it easy to join to the dimension.  The dimensions' surrogate keys appear in fact tables as  foreign keys.  They allow the fact to be joined to the appropriate dimension values, without having to use a multi-part key.
A fact table does not require a manufactured data element for either of these purposes.

Tracking change history of facts?  Use a log.

When the value of a fact can change, a surrogate key for the fact table might be proposed. This would theoretically allow the fact table to record the change history of facts, in the same way that a dimension table does.

Unfortunately, storing the history of facts in this manner destroys the usability of the star. The additive properties of the facts are lost, and it becomes much harder to use.

If the facts can change, the fact table should be updated.

To track the history of facts, use an audit table as described in a previous post.  This table can log historic values, or can store the changes as "deltas" in a dimensional format.

Joining to other fact tables?  Drill across.

The other reason surrogate keys are proposed for fact tables is that it will make them "easier to join." 

Joining fact tables to dimensions is easy; the fact table already contains foreign keys that reference the surrogate keys in dimension tables.  But what about joining to other fact tables?

Because they contain facts meant to be aggregated, fact tables should never be joined to one another. Otherwise, double-counting will ensue.

Facts from multiple stars should be combined by drilling across, as described in a previous post.  Don't try to merge them by joining fact tables.

If you're not after facts with this proposed fact-to-fact join, you must be after dimension values.  If this is the case, carry forward any/all useful dimensions to other stars in the value chain.

More info

To learn more about surrogate keys, check out these posts:
For more on tracking the change history of facts, check out this post:
To learn about drilling across, read this post:
If you find this blog helpful, please consider picking up a copy of my book, Star Schema: The Complete Reference.