Friday, March 27, 2009

Is a Snapshot an Aggregate?

Students in my classes often ask me if a snapshot fact table is an aggregate. My short answer is that it doesn't matter. But here's the long answer.

This is really a matter of semantics more than anything else. What it boils down to is how you define an aggregate table. Is it simply a table that summarizes data from other tables? Then yes, most of the time a snapshot will fit this definition -- but not always. However, I like to reserve the word "aggregate" for a very specific kind of summary. If you do as well, then the answer is no. Never.

Before getting into that, though, here is some background.

Snapshot Fact Tables

A snapshot, for the uninitiated, is a fact table that periodically samples a measurement. A snapshot always includes at least one measurement of status, such as a balance or level. A transaction fact table, in contrast, captures measurements as they occur.

The snapshot is useful for describing the effect of transactions, rather than transactions themselves. The best example is your bank account. Its all well and good if you have a record of each deposit, check, interest payment, fee, direct debit, and so forth. But what is your balance? Aggregating all your transactions is not the most expedient way to figure that out.

Sometimes Snapshots Summarize Transactions, but not Always

People sometimes think of snapshots as aggregates because they can be constructed from a table that contains transactions. That is certainly true in the case of account balances. If you have the transactions, you can compute the snapshots.

In this sense, you might be justified in thinking of a snapshot as an aggregate. But I don't. I'll explain why in just a second.

First, though, it is important to understand that not all snapshots are constructed from transactions. Sometimes, the number of transactions is too numerous to keep around. But in other cases, there simply aren't any. For example, we can't make use of transactions to describe the ambient temperature on the factory floor, or the level of pressure inside an industrial device. In these cases, the snapshot is clearly not a summary.

Snapshots Contain "new" Facts

So how about those other situations -- the ones where there are transactions that can be used to create a snapshot? Isn't a snapshot an aggregate in this case?

It really depends on how you define an aggregate.

The word aggregate is normally reserved for a summary table that does not transform the structure of its corresponding base table. For example, suppose account transactions are recorded in a fact table, with a fact called transaction_amount. A monthly aggregate would have exactly the same fact. Still called transaction_amount, is the same as the fact in the base table. The only difference is in level of detail: it represents a month's worth of transactions, rather than individual transactions. It is the same measurement.

Account balance (or other status measurements like levels) is a not the same as what the base fact table records. It is not an aggregation of transaction that occurred during the snapshot period; it is something more. It describes the net effect of the account transactions. It is a different measurement.

Conclusion

I use the term derived table to describe any table in a dimensional design uses another as its source. I save the word aggregate for non-transformative summaries.

To sum things up, then:

Yes, a snapshot sometimes summarizes a transaction fact table.

But if you accept these definitions for aggregate tables and derived tables, the answer is different:

A snapshot is is sometimes a derived table, but never an aggregate.

This is just a matter of semantics, and not very important to me. Call it whatever you want, and I will be fine with it. The important thing is that you have a dimensional design that enables useful analytics.

- Chris