Sunday, February 19, 2012

Accumulating snapshots and Type 2 changes

How does one maintain an accumulating snapshot when the dimension that defines its grain can undergo type 2 changes?

A reader recently asked this question via the comments for a recent post.

The short answer:  an accumulating snapshot has only 1 row for each thing being tracked.  If that "thing" can undergo type 2 changes, then the accumulating snapshot should link to only one version -- the current version.

State transitions

Many businesses track items that go through state transitions.  A trouble ticket, for example, is opened, assigned to a support rep, and eventually closed.  A mortgage application is submitted, reviewed, processed, underwritten, and settled upon.

These activities may be represented by a series of fact tables associated with the various process milestones.  Or, they may be captured by a single fact table that records a row for each status change.

When it comes time to study the average time spent at the various stages, however, these options may not be ideal.  In the former case, it will be necessary to drill across several fact tables.  In the latter case it will be necessary to perform correlated subqueries.

The accumulating snapshot model addresses these concerns.

The grain of an accumulating snapshot

An accumulating snapshot receives one row for each item being tracked.  This single row is updated as the item reaches various milestones.  A series of facts are incremented for each day spent in a given stage.  When a milestone is achieved, the appropriate date key is populated.

(For a full refresher on the accumulating snapshot, see the links at the end of this post.)

Unlike other fact tables, then, the accumulating snapshot always reflects the current state of affairs.  In the case of a mortgage processing fact table, for example, each row shows a single application, along with how many days it has spent at each processing stage.

This makes it very easy to study the average time between various stages -- just take the average of the facts in question.  No drilling across, no correlated subqueries.

Changes in the defining dimension

Back to the original question, then. What happens if the dimension that defines the grain of the accumulating snapshot can undergo type 2 slow changes?

The goal of the accumulating snapshot is to summarize each real world item in a single row.  Even if the dimension contains multiple rows for a particular item, the fact table must contain only one. If it contained more than one, its usability would harmed.

As I have written before, this means the defining dimension may have more rows than the accumulating snapshot.  This is OK.  Its just another way in which the accumulating snapshot differs from other types of fact tables.

Our mortgage application, then, must have one and only one row in the fact table -- even if type 2 changes mean there are more than one rows for it in the dimension.

Linking to the current row

The question, then, becomes which row should the fact table link to?

Because the accumulating snapshot is current-valued, it makes the most sense to have it link to the row in the dimension table that represents the current state of the item being tracked.  In our example, this would be the most recent version of the mortgage application.

If you increment the facts in your accumulating snapshot daily (something I suggest in my book), you should also take this opportunity to adjust the key values for any items that have undergone a type 2 change.  If you adjust the facts only when a milestone is met (something you may do if the volume is very high), you may choose to update the key value at this time instead.

Learn more

This and other topics surrounding the accumulating snapshot are discussed in depth in Star Schema: The Complete Reference, in Chapter 11: "Transactions, Snapshots and Accumulating snapshots."

You can help support this blog by using the links in the sidebar to purchase a copy.

You can also read more in these posts:
  • When do you need an accumulating snapshot (January 26, 2012) a recent post that triggered the question, wherein a reader asked how to choose between building an accumulating snapshot and tracking something as a series of type 2 changes. In some cases, the answer is to do both.
  • Q&A: Accumulating snapshots (October 1, 2010) Explores the cardinality relationship between accumulating snapshot and dimension table 

Image Credit:  Creativity103 via Creative Commons