Chris Adamson’s Blog: More on Surrogate Keys

Tuesday, September 8, 2009

More on Surrogate Keys

This post addresses a common follow-up question regarding surrogate keys:

"My source system already tracks changes. In this case, do dimension tables really need their own unique identifiers?"

Yes. There may be exceptions, but they are far less common than you may think, and they limit your future options.

In a previous post, I laid out the case for surrogate keys. In short:

By not reusing unique identifiers from a source system, dimensions are free to respond to changed data in whatever manner makes most sense from an analytic perspective.
Surrogate keys are the best option, though a multi-part key is also possible.

It is tempting to assume that if the source system logs all changes, there is no need to maintain a separate unique identifier for dimensions. I would seem that they can maintain all history, while reusing the natural keys from the source system. Usually, this is not the case.

A Dimension is Not Equivalent to an Entity

Unfortunately, a dimension in a data warehouse rarely corresponds to a single entity in an operational system. Operational systems are more highly normalized; most dimensions correspond to more than one table in a source system. A customer dimension, for example, may draw information from multiple source tables: customer, household, address, phone_number, demographics and so forth.

A change log for the central entity corresponding to a dimension may not reflect changes to some of the related tables, especially when they represent dependent entities or are linked through intersect tables. If this is the case, it will be impossible for the dimension to undergo a type 2 response when ancillary information changes.

If every source table that feeds a dimension is logged, it may be possible to design a multi-part unique identifier that is completely recycled. (As observed previously, compound keys in dimensions are feasible but not optimal.) This approach can lock you into type 2 processing for defining elements of the related tables, even when a type 1 response is desired.

Surrogate Keys Will Limit Future Rework

Even in the rare case when a source system handles all changes exactly the way we want, reuse of the natural keys limits future options.

If the source system is modified, the desired behavior may not be maintained. It will either be necessary to engage in significant rework of the dimensional model (and all dependent reports), or simply live with the unsatisfactory change handling of the source system.

Mergers and acquisitions may also bring additional source systems that correspond to an existing dimension. These systems arrive complete with entirely different identifiers. Incorporating them as sources for an existing dimension will always necessitate some rework of the dimension. But if surrogate keys are in place, fact tables will be unaffected and existing reports will still work. If, on the other hand, natural keys are the basis of unique identifiers, rework will cascade to related fact tables and also to existing queries and reports.

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software