Thursday, January 14, 2010

Accumulating Snapshots: Are They Necessary? [Q&A]

A reader asks the following question:

Q: We have an accumulating snapshot that tracks trouble tickets from submission to closure. The fact table has one row for each trouble ticket. Why do we need it? Can't we just place all this information in the trouble ticket dimension?
A: You probably do need the fact table, but it might be worth putting some of the information in the dimension table as well.

Before I explain, here is a quick refresher on the accumulating snapshot.

Accumulating Snapshot

An accumulating snapshot is a kind of fact table used to correlate activities surrounding a specified item. In this case, that item is a trouble ticket.

Accumulating snapshots are useful because they simplify the queries necessary to study elapsed time between events. In the reader's case, that might be the time between submission and closure of a ticket (there are probably some interim steps as well).

On the operational side, these events are probably recorded in different rows, and maybe even in different systems. Figuring out the days between these events can be messy, especially if averaging a results for a large number of tickets. Correlated subqueries will be reqiured, and performance may be poor.

The accumulating snapshot does the correlation of these activities ahead of time, as part of the ETL process. In this case, it has one row for each ticket. Presumably, there are foreign keys that identify the ticket, the date submitted, and the date closed. An optional fact can record the elapsed time between these dates. This fact is very easy to aggregate and group by any dimension of interest. No need for for correlated subqueries.

You can read more about accumulating snapshots in a prior post.

The Fact Table is Necessary

The reader has noted that the fact table has one row per trouble ticket. (That's a good statement of grain, by the way.) Since there is only one row for each ticket, it might seem that the content of the fact table could be simply be placed in the ticket dimension.

This is probably not a good idea. Though your fact table has one row per ticket, your ticket dimension may have more than one, due to type 2 changes that track history.  For a particular trouble ticket, when a type 2 change occurs, a new row is added for the new version of the ticket. A given ticket may therefore have more than one row in the dimension.

If you were to place a fact in this dimension table, like the days between submission and closure, it would be repeated for each version of the same ticket. Aggregating it would result in double counting.

The fact is better placed in the fact table, where they can safely be summed, averaged, or whatever is necessary.

(By the way, the same would occur if you placed the two dates in the dimension table. They would repeat for each version of the ticket. If you try to aggregate the elapsed time between the dates, the same ticket would be counted multiple times -- once for each version.)

Place Information in the Dimension Also (Not Instead)

Information tracked by the accumulating snapshot may also be useful in the dimension table. But here, its purpose is to serve as dimensional data.  Not as a replacement for the fact table.

For example, the dates the ticket was opened and closed could make helpful type 1 attributes in the ticket dimension.  This is especially useful if the dimension is connected to other fact tables, such as one that tracks phone calls to the support center. The dates can be used to group call counts in interesting ways. For example, "How many support calls were generated by tickets opened on January 1 vs. January 2."

If you want to, you can also place the elapsed days fact in the dimension table. Here, it should be used strictly as a behavioral dimension -- not as a fact.  Use it to group results. Don't try to use it as a fact; that would cause double counting. You might even want to convert it into bands to avoid confusion (1-5 days, 6-10 days, over 10 days.)

- Chris
Image Credit:  Creativity103 via Creative Commons

Do you have a question of your own? Send it in. I'd be happy to answer.

Mobile Safari Glitch Corrected [Announcements]

I received a report that was not redirecting properly for some readers using the mobile version of the Safari browser.

I believe the issue has been corrected.

If you continue to have problems, you can always use to get here instead.

Apologies for the inconvenience.


Monday, January 11, 2010

TDWI Members: Read Chris's "Ten Mistakes To Avoid"

If you are a member of The Data Warehousing Institute, be sure to check out your Q4 2009 benefits.

I wrote an installment of the "Ten Mistakes" series entitled Ten Mistakes to Avoid in Dimensional Design.  It covers common blunders in the use of dimensional design, both technical and non-technical.

This publication is only available to TDWI members.

- Chris