Friday, October 1, 2010

Q&A: Accumulating Snapshots

A reader asks about 1:1 relationships in a star schema.
I am designing a superannuation data warehouse that will have a claims accumulating snapshot fact table. This table will contain  number of dates that will be filled in when a claim is being processed. From this  we will be able to calculate time spans between the various claims processing stages.

It is intended that the fact table will contain a claim_key that will reference a claims dimension (1:1 relationship ).  I have been told by a few people that a 1:1 relationship is not a recommended design model.
- A reader in Australia

A:  There is absolutely nothing wrong with having a 1:1 relationship.  (That's true for both ER models and dimensional models.) 

Accumulating snapshots often cause the kind of disagreement you mentioned.  They are less common.  People mistakenly generalize characteristics of more common designs, assuming they apply to all stars. 

This is natural, but it leads to three incorrect assumptions:
  1. The fact table is on the "many" end of a 1:Many relationship with each dimension
  2. The fact table will have the most rows
  3. Fact table rows are not updated once they are loaded
These assumptions are all false when it comes to an accumulating snapshot.  (In fact, even a transaction fact table may violate the first two assumptions.) 

You've asked about the first one, but let me debunk all three:

  1. By definition the accumulating snapshot will have a 1:1 relationship with one of the dimension tables.  Why? Because its purpose is to track the progression of some discrete item through a business process.  Its grain is almost always stated in terms of a dimension -- e.g. "One Row per Claim."

  2. If the dimension can undergo type 2 slow changes, the accumulating snapshot will likely have LESS ROWS than the dimension.  This is desirable, and I have written about it before.

  3. The purpose of the accumulating snapshot is to study lags between key events, while avoiding correlated sub queries.  Unlike other fact tables, this requires the row to be updated as more information becomes known about the item being tracked.

I've written about these topics a few times in the past, including the post mentioned above. I also explore these topics in much greater detail in Chapter 11 of Star Schema: The Complete Reference.

In the book, I also show that an accumulating snapshot be used even when there is not a fixed, linear series of business steps being tracked.

Anyone who is not familiar with the accumulating snapshot is encouraged to learn more.  It is an essential design technique for many common business requirements.

Thanks for the question, and keep them coming...

Image Credit:  Creativity103 via Creative Commons