Chris Adamson’s Blog: Dates In Dimension Tables

Thursday, May 12, 2011

Dates In Dimension Tables

A reader asks about tracking dates in the fact table vs. dimension tables.

Q: It seems to me that it is sometimes more effective to store date keys in the dimension table -- particularly if that date could be used to describe the rows in many different fact tables. The alternative would be to duplicate the date key in multiple fact tables.

For example, the effective date of an insurance policy could be used against many fact tables. Should we store the effective date key in each of the fact tables? Or store it in a single row in the Policy?

The differences in design would be a star schema (fact table) versus a snowflake (stored in the dimension).

- Robert
Montreal

A: Many models feature at least one major dimension with a date that might be used to qualify facts in more than one star. This happens in other industries, too. Examples include dimension tables that represent orders, contracts, tests and so forth.

Avoid placing date_key columns in dimension tables. It increases SQL complexity, reduces schema understandability, and may impact performance.

Some designers may allow limited use of snowflaking, but not in this situation.

Date in Dimension Table, Day Key in Fact Table

When you have an date that (1) clearly describes something represented by a dimension, and (2) will also be used to group facts, do two things:

Place good old-fashioned dates in the dimension table
In any star where you may want filter/group/aggregate facts by some aspect of the date in question, add date keys to the fact table

For example, in Robert's case there might be a Policy dimension table with an attribute called policy_effective_date. This is a date, not a key.

There is also a star that tracks claims. To support analysis of claim information using the policy effective date, the fact table will contain a day_key_policy_effective. The fact table may contain other day_keys as well (such as day_key_claim, which supports analysis of claim data by the date of the claim.)

Judicious Use of Snowflaking?

There are times when some designers might choose a snowflake path -- placing dimension keys into dimension tables. This happens when there is a set of attributes that appear in multiple dimension tables.

This occurs with geographical data. The characteristics of a location might appear in a customer dimension table, a warehouse dimension table and a department dimension table.

In this case, the concern is that the ETL routines may not consistently process locations across multiple dimension tables. If their attributes or slow change characteristics are not managed identically, inconsistencies may develop across these tables.

Designers reason that creating one location table solves this problem. Each dimension that contains location data will contain a location_key. This ensures a single consistent representation of locations.

I discuss this technique in Chapter 7 of Star Schema The Complete Reference, and point out that it opens the door for a different kind of ETL challenge. If location data is moved to an outrigger, each time there is Type 2 slow change to a location, all related dimension table rows must also undergo Type 2 changes.

My preference is to allow the attributes to repeat in multiple tables, and to increase the QA focus on related ETL programs.

Thanks to Robert for the question. If you have one of your own, send it to the address in the sidebar.

-Chris

Image credit: Gravityx9 licensed under Creative Commons 2.0

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software