Wednesday, September 21, 2011

Avoid Surrogate Keys for Fact Tables

I am often asked for feedback on designs that include a "surrogate key for the fact table."  There are two common reasons for which they are proposed; both have better alternatives. 

Surrogate keys are for dimension tables

A surrogate key is an attribute that is created to uniquely identify rows in a dimension table. It does not come from a source system; it is created expressly for the dimensional schema.

Surrogate keys for dimension tables serve two important purposes:
  1. They make it easier to track history.  They allow the dimension to capture changes to something, even if the source does not.  Absent a surrogate key, this would be difficult; the primary key of the dimension would be concatenation of natural keys and type 2 attributes.
  2. They make it easy to join to the dimension.  The dimensions' surrogate keys appear in fact tables as  foreign keys.  They allow the fact to be joined to the appropriate dimension values, without having to use a multi-part key.
A fact table does not require a manufactured data element for either of these purposes.

Tracking change history of facts?  Use a log.

When the value of a fact can change, a surrogate key for the fact table might be proposed. This would theoretically allow the fact table to record the change history of facts, in the same way that a dimension table does.

Unfortunately, storing the history of facts in this manner destroys the usability of the star. The additive properties of the facts are lost, and it becomes much harder to use.

If the facts can change, the fact table should be updated.

To track the history of facts, use an audit table as described in a previous post.  This table can log historic values, or can store the changes as "deltas" in a dimensional format.

Joining to other fact tables?  Drill across.

The other reason surrogate keys are proposed for fact tables is that it will make them "easier to join." 

Joining fact tables to dimensions is easy; the fact table already contains foreign keys that reference the surrogate keys in dimension tables.  But what about joining to other fact tables?

Because they contain facts meant to be aggregated, fact tables should never be joined to one another. Otherwise, double-counting will ensue.

Facts from multiple stars should be combined by drilling across, as described in a previous post.  Don't try to merge them by joining fact tables.

If you're not after facts with this proposed fact-to-fact join, you must be after dimension values.  If this is the case, carry forward any/all useful dimensions to other stars in the value chain.

More info

To learn more about surrogate keys, check out these posts:
For more on tracking the change history of facts, check out this post:
To learn about drilling across, read this post:
If you find this blog helpful, please consider picking up a copy of my book, Star Schema: The Complete Reference.

Thursday, September 15, 2011

Factless Fact Tables

This post introduces the concept of the factless fact table and the situations where you would model one.

When a fact table does not contain any facts, it is called a factless fact table. There are two types of factless fact tables: those that describe events, and those that describe conditions. Both may play important roles in your dimensional models. 

Factless fact tables for events

Sometimes there seem to be no facts associated with an important business process. Events or activities occur that you wish to track, but you find no measurements. In situations like this, build a standard transaction-grained fact table that contains no facts.

For example, you may be tracking contact events with customers. How often and why we have contact with each customer may be an important factor in focusing retention efforts, targeting marketing campaigns, and so forth.

The factless fact table shown here captures a row each time contact with a customer occurs.  Dimensions represent the date and time of each contact, the customer contacted, and the type of communication (e.g. inbound phone call, outbound phone call, robo-call, email campaign, etc.).

While there are no facts, this kind of star schema is indeed measuring something: the occurrence of events. In this case, the event is a contact. Since the events correspond to the grain of the fact table, a fact is not required; users can simply count rows:

    count (contact_type_key)as "CONTACT COUNT"

  NAME        COUNT
  =========== =======
  BURNS, K          8
  HANLEY, S        11
  ROGERS, S         4
  SCANLON, C        8
  SMITH, B          8
  SMITH, M.E.      12

This fragment of SQL reveals that there really is a fact here: the number of events.  Its not necessary to store it, because it coincides with the grain of the fact table.

To acknowledge this, you can optionally add a fact called "contact_count"; it will always contain the constant value 1.

Factless fact tables for conditions

Factless fact tables are also used to model conditions or other important relationships among dimensions. In these cases, there are no clear transactions or events, but the factless fact table will enable useful analysis.

For example, an investment bank assigns a broker to each customer. Since customers may be inactive for periods of time, this relationship may not be visible in transaction-grained fact tables. A factless fact table tracks this important relationship:

Each row in this factless fact table represents a bounded time period during which a broker was assigned to a particular customer.  This kind of factless fact table is used to track conditions, coverage or eligibility.  In Kimball terminology, it is called a "coverage table."

Comparing conditions with events yields interesting business scenarios.  This factless fact table can be compared to one that tracks investment transactions to find brokers who are not interacting with their customers, brokers who conducted transactions with accounts that belong to a different broker, etc.

More on factless fact tables

If you've got questions about factless fact tables, send them to the address in the sidebar of this blog.

You can also read more about factless fact tables in Star Schema: The Complete Reference.  Chapter 12 provides detailed coverage of factless fact table design.