Chris Adamson’s Blog: January 2010

Thursday, January 14, 2010

Accumulating Snapshots: Are They Necessary? [Q&A]

A reader asks the following question:

Q: We have an accumulating snapshot that tracks trouble tickets from submission to closure. The fact table has one row for each trouble ticket. Why do we need it? Can't we just place all this information in the trouble ticket dimension?

A: You probably do need the fact table, but it might be worth putting some of the information in the dimension table as well.

Before I explain, here is a quick refresher on the accumulating snapshot.

Accumulating Snapshot

An accumulating snapshot is a kind of fact table used to correlate activities surrounding a specified item. In this case, that item is a trouble ticket.

Accumulating snapshots are useful because they simplify the queries necessary to study elapsed time between events. In the reader's case, that might be the time between submission and closure of a ticket (there are probably some interim steps as well).

On the operational side, these events are probably recorded in different rows, and maybe even in different systems. Figuring out the days between these events can be messy, especially if averaging a results for a large number of tickets. Correlated subqueries will be reqiured, and performance may be poor.

The accumulating snapshot does the correlation of these activities ahead of time, as part of the ETL process. In this case, it has one row for each ticket. Presumably, there are foreign keys that identify the ticket, the date submitted, and the date closed. An optional fact can record the elapsed time between these dates. This fact is very easy to aggregate and group by any dimension of interest. No need for for correlated subqueries.

You can read more about accumulating snapshots in a prior post.

The Fact Table is Necessary

The reader has noted that the fact table has one row per trouble ticket. (That's a good statement of grain, by the way.) Since there is only one row for each ticket, it might seem that the content of the fact table could be simply be placed in the ticket dimension.

This is probably not a good idea. Though your fact table has one row per ticket, your ticket dimension may have more than one, due to type 2 changes that track history. For a particular trouble ticket, when a type 2 change occurs, a new row is added for the new version of the ticket. A given ticket may therefore have more than one row in the dimension.

If you were to place a fact in this dimension table, like the days between submission and closure, it would be repeated for each version of the same ticket. Aggregating it would result in double counting.

The fact is better placed in the fact table, where they can safely be summed, averaged, or whatever is necessary.

(By the way, the same would occur if you placed the two dates in the dimension table. They would repeat for each version of the ticket. If you try to aggregate the elapsed time between the dates, the same ticket would be counted multiple times -- once for each version.)

Place Information in the Dimension Also (Not Instead)

Information tracked by the accumulating snapshot may also be useful in the dimension table. But here, its purpose is to serve as dimensional data. Not as a replacement for the fact table.

For example, the dates the ticket was opened and closed could make helpful type 1 attributes in the ticket dimension. This is especially useful if the dimension is connected to other fact tables, such as one that tracks phone calls to the support center. The dates can be used to group call counts in interesting ways. For example, "How many support calls were generated by tickets opened on January 1 vs. January 2."

If you want to, you can also place the elapsed days fact in the dimension table. Here, it should be used strictly as a behavioral dimension -- not as a fact. Use it to group results. Don't try to use it as a fact; that would cause double counting. You might even want to convert it into bands to avoid confusion (1-5 days, 6-10 days, over 10 days.)

- Chris

Image Credit: Creativity103 via Creative Commons

Do you have a question of your own? Send it in. I'd be happy to answer.

Mobile Safari Glitch Corrected [Announcements]

Posted by Chris Adamson

I received a report that www.StarSchemaCentral.com was not redirecting properly for some readers using the mobile version of the Safari browser.

I believe the issue has been corrected.

If you continue to have problems, you can always use blog.oaktonsoftware.com to get here instead.

Apologies for the inconvenience.

-Chris

Monday, January 11, 2010

TDWI Members: Read Chris's "Ten Mistakes To Avoid"

Posted by Chris Adamson

If you are a member of The Data Warehousing Institute, be sure to check out your Q4 2009 benefits.

I wrote an installment of the "Ten Mistakes" series entitled Ten Mistakes to Avoid in Dimensional Design. It covers common blunders in the use of dimensional design, both technical and non-technical.

This publication is only available to TDWI members.

- Chris

Subscribe to: Posts (Atom)

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software