Chris Adamson’s Blog: Avoid Using Dates as Keys

Wednesday, May 26, 2010

Avoid Using Dates as Keys

This post explores the impact of using a date as the primary key for a dimension table.

Q: I'd like to hear your thoughts on using dates as keys. Is it really necessary to have a surrogate key for a date dimension?

Hartford, CT

A: It is not absolutely necessary. But you should use a surrogate key anyway. Here's why.

Suppose you have a DAY dimension which contains one row for each calendar date. In addition to the full date, it contains columns that represent the year, quarter, month, day of week, holiday status, and so forth. It also contains columns that represent frequently used combinations of these various attributes.

Users (or developers) can use the columns to do things like filter queries, aggregate facts, sort results, and drive subtotaling. Database administrators can index these columns to optimize these activities.

You might be tempted to use the full date as the primary key of the table. This is urge is understandable -- why bother creating an additional column called DAY_KEY. Seems like extra work for nothing, right?

Consistency and Performance Issues

Remember that the primary key of a dimension table will also appear in fact tables. If a date is used, users or developers will eventually try using it as a "shortcut" in queries, avoiding a join to the DAY table.

This leads to inconsistencies. Developers may mean well, thinking they are "saving a join." But they may apply different rules to compute things like fiscal periods.

Use of a day table guarantees consistent computation of all the characteristics of each day. All the columns are computed exactly once, during the ETL process. Everyone who uses them gets consistent results.

Moreover, avoiding the DAY table may generate performance issues. If a report developer wants to filter for a particular day of the week, they may apply SQL date comparison functions to the day column in the fact table. The database will probably have to make this comparison for each row in the fact table.

A day of the week in the DAY table is far more efficient, because it is pre-computed and indexed.

You may think you can prevent these mishaps. But you cannot. Sooner or later, they will occur. Dates in the fact table will beckon. A user, developer, or consultant will try to take a shortcut. Perhaps under your watch, or perhaps after you have gone.

Guarantee consistency and performance by using a surrogate key.

If you are curious, read on for some other, less important reasons.

Fact Table Row Size

Of lesser importance, but still worth considering, is the impact on the fact table row size.

Once again, remember that he primary key of a DAY table will appear in each fact table as a foreign key. A date is likely to consume many more bytes than a surrogate key generated by the ETL process.

In Oracle, for example, a DATE consumes 7 bytes. One hundred years is about 36,525 days. A surrogate key that holds this many values can be stored as a NUMBER(5). This will consume a maximum of 4 bytes (Caveat: I am not a DBA, just consulted the docs.)

That difference can be significant for fact tables that accumulate a very large number of transactions. Many fact tables have multiple days associated with each row, compounding the problem.

Type 2 Slow Changes to DAYs

As I've written before, the surrogate key is essential for supporting slow change processing. If you don't use a surrogate key, you are probably closing off the possibility of tracking the history of changes to dates.

In rare cases, you may need to track type 2 slow changes to a DAY table. Whether a particular date is considered a corporate holiday, for example, may change. Normally, this would call for a type 1 response (overwrite.) In a human resources data mart, however, it may be useful to know if the day was considered a holiday when someone requested their vacation time. This requires a type 2 slow change. If you are not using a surrogate key, this option is closed to you.

That's far-fetched, I grant you, but possible.

-Chris

Have a question of your own? Send it to the address in the sidebar. I answer all my mail (eventually), and feature some here on Star Schema Central.

Image: "Keys" by Takacsi75,
licensed under Creative Commons 2.0.

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software