Chris Adamson’s Blog: Build High Resultion Stars

Friday, January 7, 2011

Build High Resultion Stars

On TV, a low-resolution photo is never a problem for crime scene investigators. Load it into a computer, zoom in on a reflection, click "enhance", and presto! the criminal is revealed.

In the world of data warehousing, we cannot rely on the CSI effect. If we start with low-resolution data, the detail is lost forever. Eventually, someone will ask a question that requires more detail. And it simply wont be there to answer the question.

A few simple guidelines will help you avoid this unfortunate situation.

A Common Problem

Omission of detail is one of the biggest frustrations caused by legacy designs. It is the number three problem I encounter in design reviews (after failure to use surrogate keys and failure to plan slowly changing dimensions.)

The reason for this is simple: people often design their schema to support current requirements. Unfortunately, this does not work in data warehousing. Analytic requirements constantly change. As a designer, your job is to produce solutions that will answer questions that are not yet known.

This may sound like an impossible task, but it's not. Its one of the strong suits of dimensional design. Three guidelines for high-resolution design will help future-proof your solution.

1. Match Grain to Source Data, Not Requirements

When you're designing a fact table, you need to establish its grain. A statement of grain defines what is represented by a row in a fact table. This is the "resolution" of your measurements. (For a refresher on grain, see the post Rule 1: State Your Grain.)

Don't set grain at the level of detail that meets requirements. Set it at the level of detail at which data is available.

For example, you are designing a fact table that measures sales. Someone asks for daily totals of sales in dollars. Don't assume daily totals of sales is sufficient. Eventually someone will want to go deeper, perhaps looking for products that are bought together. This requires setting your grain at the order line level of detail.

Once you have identified the process a fact table represents, look at the resolution of the source data. That should guide your decisions on grain. Of course, this needs to be kept within reason. If the available level of detail would result in fact tables that are too large or expensive to manage, you might need to do some summarization. But be careful; you'll be working with low-resolution data in the future.

2. Capture All the Facts that Fit

Remember that each fact table describes a process. When you fill in the facts, don't just include ones that people asked for. Include all available facts that describe that process.

Returning to our sales data, this means you should not stop at sales dollars. What else is known about sales? Quantities, tax, and other information is likely to be readily available, and many operational systmes may also offer a rudimentary concept of cost or margin. Include them!

3. Fill Out Dimensions

The last guideline for a high-resolution design involves the dimensions. You can probably guess what it is:

When you design a dimension table, don't just include attributes someone asked for. Find what is available in the source system, and include it all.

A simple "Product Name" and "Product ID" might be all that's needed to support sales by product. But what else do source tables hold that describes your products? Manufactures? Suppliers? Colors? Sizes? Weights? Include all this in your designs.

Remember, dimensions are the source of all context in our reports. The more we fill them out, the more kinds of questions people will be able to ask.

Initial Design is the Right Time

Match your grain, facts and dimensions to the available data, even if that goes beyond the requirements.

Of course, you can recover from errors in any of these areas. This will require modifying your design, ETL process, and so forth. When its decided that additional detail is needed going forward, you can make these changes.

But it is often difficult to add detail to historic data already in the data warehouse. For example, it may require accessing backup data. Worse, it may put ETL developers through the difficult process of performing slow change processing in reverse.

Think high-resolution today, and you can avoid these problems tomorrow.

Image Credit: Alan Cleaver via Creative Commons 2.0

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software