Tuesday, October 13, 2009

Highly Summarized Dimensions

From the inbox today, a question about aggregate design:
Q:  "If you roll up a dimension that is hierarchical, but you are now left with the highest level of the hierarchy (no further roll-up would be possible in that dimension) - would you maintain a separate dimension for this in the aggregated star?

"Or would you join a number of these unrelated top hierarchy levels in a junk dimension?"

- Dublin, Ireland
The reader has a potential aggregate that will summarize more than one dimension at a very high level.  For example, an aggregate of sales that summarizes the customer dimension by country and summarizes the product dimension by category.

As a general rule of thumb, try to set up aggregate dimensions that preserve the original separation of the base dimension tables.  So rather than have country and category combined as a sort of "junk" dimension, create a rollup of customer called "country" and a rollup of product called "category." 

This gives the aggregate star a kind of "symmetry" with the original star.  It tends to be the most workable solution with aggregate-aware query tools, databases, etc.  It also is the easiest to manage when no tools are aggregate-aware, as it minimizes the effort required to rewrite SQL.

If the summary is very small, it may be workable to create a "big wide table" that stores the dimension values directly with the summary facts.  This works best if you are using a tool to implement the aggregate that is able to exploit the query rewrite function of a database, like a materialized view or materialized query table.  In this case, applications can still query the "base" star, but receive the benefit of the aggregate.

But do you really need this aggregate?  The more highly summarized an aggregate is, the fewer queries or reports it is able to accelerate.   It is usually possible to construct aggregates that are not quite so highly summarized, giving a performance boost to a wider range of queries.  For example, summarizing a customer dimension by region, rather than country, will allow the aggregate to work with more reports--including those that focus on country.

- Chris

Image credit:  ZapTheDingbat