Chris Adamson’s Blog: December 2009

Wednesday, December 16, 2009

Dimensional Designs are not Denormalized

I don’t usually bother to take a stand on issues surrounding terminology or theory. For this post I am going to make an exception.

I often hear people say that a star schema is a denormalized data model. There is an element of truth hiding in those words, but I think it describes how some people design stars, and not what a star is. A better description is unnormalized or not normalized. But even these words do not describe all stars.

Regardless of the words you choose, using the vocabulary of normalization is generally a disservice to the novice designer. And none of these words should ever by used to define the star schema.

Continue reading to learn why.

Read the full post »

Wednesday, December 9, 2009

Rule 1: State Your Grain

Posted by Chris Adamson

Make sure you have a statement of grain for each fact table or cube in your dimensional design.

I receive lots of questions from people who are working through an issue with a star or cube. Most of the time, I must counter with a question of my own: “What is the grain?” Without this basic information, it is usually impossible to comment on whatever design issue the person is facing.

Being able to state grain is important, and not just for its value as a conversation starter. Here’s a brief look at what grain is, how to define it, and what happens if you don’t.

What is Grain

A statement of grain identifies what is represented by a single, granular row in a fact table (or an un-aggregated measure in a cube.) Each and every row in the fact table should meet this definition, with no wiggle-room.

When grain is not explicitly defined, or is defined in an ambiguous way, all manner of problems may arise. In addition to hampering your ability to talk about the design with someone else, ill-defined grain can cause to severe technical challenges in the reporting process. It may even lead to reports that are just plain wrong.

Defining Grain

There are two ways to state the grain of a fact table. The first way is to use business language, referencing a specific, well-understood artifact of the activity described by the star. For example, a star that describes orders may have the following grain: “Order measurements at the order-line level of detail.” That sums things up pretty well. Each row of the fact table corresponds to a single order line.

Sometimes it is not easy to state grain in this manner. When that’s the case, you use the dimensions in a star to indicate what each unique row in the fact table represents. For example, a star that tracks the processing milestones of mortgage applications might have the following grain: “Processing measurements by application and status.” This fact table will have a new row each time an application (one dimension) undergoes a change in status (another dimension.)

When using dimensional terms to define grain, do not simply rattle off all the dimensions present in the star. Instead, list only those that are necessary to define a unique row. In the mortgage status change example, it is not necessary to mention that the star will also contain dimensions for date, customer, mortgage officer, mortgage product, and so forth.

Business definitions of grain are usually used for transaction fact tables. Dimensional grain definitions are commonly used for snapshots, accumulating snapshots, derived schemas and aggregates.

Fuzzy Grain

Poorly defined grain can lead to trouble. Ill-defined grain can mask a situation where a fact table is actually being used to track two or more processes. It can also lead to situations in which the fact table contains two or more levels of aggregation. In the former case, single-process reporting will be hampered. BI developers will be bending over backwards to focus on the relevant subset of data. In the latter case, double-counting, triple-counting or worse is possible.

This is why “sate your grain” is rule #1 in any dimensional design.

-Chris

Image by GravityX9 licensed under Creative Commons 2.0

Subscribe to: Posts (Atom)

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software