Q: I'd like to hear your thoughts on using dates as keys. Is it really necessary to have a surrogate key for a date dimension?
Hartford, CT
A: It is not absolutely necessary. But you should use a surrogate key anyway. Here's why.
Suppose you have a DAY dimension which contains one row for each calendar date. In addition to the full date, it contains columns that represent the year, quarter, month, day of week, holiday status, and so forth. It also contains columns that represent frequently used combinations of these various attributes.
Users (or developers) can use the columns to do things like filter queries, aggregate facts, sort results, and drive subtotaling. Database administrators can index these columns to optimize these activities.
You might be tempted to use the full date as the primary key of the table. This is urge is understandable -- why bother creating an additional column called DAY_KEY. Seems like extra work for nothing, right?
Consistency and Performance Issues
Remember that the primary key of a dimension table will also appear in fact tables. If a date is used, users or developers will eventually try using it as a "shortcut" in queries, avoiding a join to the DAY table.
This leads to inconsistencies. Developers may mean well, thinking they are "saving a join." But they may apply different rules to compute things like fiscal periods.
Use of a day table guarantees consistent computation of all the characteristics of each day. All the columns are computed exactly once, during the ETL process. Everyone who uses them gets consistent results.
Moreover, avoiding the DAY table may generate performance issues. If a report developer wants to filter for a particular day of the week, they may apply SQL date comparison functions to the day column in the fact table. The database will probably have to make this comparison for each row in the fact table.
A day of the week in the DAY table is far more efficient, because it is pre-computed and indexed.
You may think you can prevent these mishaps. But you cannot. Sooner or later, they will occur. Dates in the fact table will beckon. A user, developer, or consultant will try to take a shortcut. Perhaps under your watch, or perhaps after you have gone.
Guarantee consistency and performance by using a surrogate key.
If you are curious, read on for some other, less important reasons.