Friday, August 20, 2010

Surrogate Keys in a CIF

In this Q&A, two more questions about surrogate keys.  Here's the first:
Q:   In a Corporate Information Factory architecture, if the EDW contains a change history for every table, do data marts need surrogate keys?
- Los Angeles, CA
A:  Yes. 

This is similar to a question from last year, in which change history was tracked in an operational system. This time, we're looking at an EDW tracking changes.

First, a little background on the CIF for the uninitiated.

In a Corporate Information Factory (CIF) the Enterprise Data Warehouse (EDW) is an integrated repository of atomic data.  The tables in this repository are normalized.  Data marts take information from this repository and structure it in a dimensional format for departmental access.

Dimension tables are not normalized.  In a CIF, a given dimension may draw data from many tables in the EDW.

For example, a Company dimension might pull data in from a dozen or more normalized tables that include the information on the company, its locations, accounts, lines of business, market data and contact information. 

If the dimension relied exclusively on the unique identifiers of these tables to identify rows, it would require a multi-part key that would have numerous parts.

And if, as you say, your EDW tracks change history to each table, this multi-part key may be compounded by version numbers or date stamps from each source table. This is impractical.  (It also makes it difficult to support type 1 responses in the data marts.)

A related question sometimes comes up in my classes:

Q:  Can I maintain surrogate keys in the EDW?

A:  No.  

Most dimensions do not correspond to a single entity in the EDW, which is normalized, so there is nowhere to store and maintain surrogate keys.  Slow change processing and surrogate key management apply to the dimensional representation of data.

You can maintain a central repository of conformed dimensions, and you can even co-locate them with your EDW.  In fact, this is a good idea.  These master dimension tables can be replicated to your data marts, in whole or in part.

This ensures compatibility across subject areas.  It is a common misconception that conformed dimensions are only important in Kimball's bus architecture.  Even in a CIF, you may need to combine information from multiple data marts.

- Chris

Image:  "Keys" by Takacsi75, licensed under Creative Commons 2.0.

Star Schema: The Complete Reference

Chris Adamson's definitive guide to dimensional design is now available.

Use these links to order from Amazon in your country:

About the Book

This handbook is organized around dimensional design topics rather than vertical industry, so you can find what you need when you need it.

Architecture-neutral, the book’s practical advice can be put to use in any type of data warehouse, including those advocated by W.H. Inmon and Ralph Kimball.

For each design topic, coverage runs deep.  You will learn best practices along with the reasons behind them.  Each concept is explained through real-world examples.  There are over 175 illustrations, including example design diagrams and data instance charts.

The book explains the impact of design choices on BI and ETL processes, and explores situations where you may wish to deviate from best practices.

It provides a comprehensive set of design tasks and deliverables that can be incorporated into any project, regardless of architecture, scope or methodology.

To learn more, see the FAQ.

Contents of the Book

Coverage beings with fundamentals, then follows a logical progression through advanced topics that address real-world complexity. Beginners can read it cover-to-cover, while experts can jump straight to topics of interest.

The book's eighteen chapters are divided into six parts:

      Part I: Fundamentals

      1.  Analytic Databases and Dimensional Design
      2.  Data Warehouse Architectures
      3.  Stars and Cubes

      Part II: Multiple Stars

      4.  A Fact Table for Each Process
      5.  Conformed Dimensions

      Part III: Dimension Design

      6.  More On Dimension Tables
      7.  Hierarchies and Snowflakes
      8.  More Slow Change Techniques
      9.  Multi-Valued Dimensions and Bridges
      10.  Recursive Hierarchies and Bridges

      Part IV: Fact Table Design

      11. Transactions, Snapshots and Accumulating Snapshots
      12. Factless Fact Tables
      13. Type-Specific Stars

      Part V: Performance

      14. Derived Schemas
      15. Aggregates

      Part VI: Tools and Documentation

      16.  Design and Business Intelligence
      17.  Design and ETL
      18.  How to Design and Document a Dimensional Model

How to Order

If you’ve enjoyed Chris's other books, classes, or this blog, please consider supporting this effort.

Use these links to order from Amazon in your country:

 For more info, visit the FAQ on the book.

Star Schema: The Complete Reference
By Christopher Adamson
ISBN 0071744320 / 9780071744324
McGraw-Hill Osborne Media
486 pages
Also available as eBook

Thursday, August 12, 2010

Book Signing 8/17 at TDWI San Diego

If you are attending the TDWI summer conference in San Diego, be sure to stop by my book signing.

I'll be signing my new book in the main exhibit hall on Tuesday (8/17) from 11:30am - 12:30pm.  The book will be available at a decent discount, or you can bring your own copy.

It's also not too late to sign up for my course in intermediate and advanced dimensional design, which will be taking place the same day.

Hope to see you there!

Monday, August 2, 2010

FAQ on Star Schema: The Complete Reference

My third book, Star Schema: The Complete Reference, is now available.

I took a year off from work in order to write this book.  It has an immense amount of detail. 

If you’ve enjoyed my other books, my classes, or this blog, please consider supporting this effort.

Use this link to order a copy from  Or use the links in the sidebar. 

Here are answers to some questions I have gotten about the book.

Why another book on Star Schema?

I often want to refer people to something they can read on a particular aspect of dimensional design.  Usually, this is harder than it should be.

There are some great books on star schema, but they are organized into chapters based on business cases (my own prior work included).  You can’t open up a book like that and turn to “the chapter” that covers a particular design technique.

If you want to read about snapshot designs, for example, you’ve got to flip back and forth between chapters about inventory, banking, budgeting, etc.

Also, all books target a particular data warehouse architecture—either Inmon’s "Corporate Information Factory" or Kimball’s dimensional “bus architecture.”  Since most of the principles of dimensional design are universal, this can get in the way.

My aim was to create the missing reference on star schema design, and to make it useful to anyone who works with dimensional data—stars, snowflakes or cubes.

How is this book different?

It’s structured into chapters and sections based on design topics, instead of by industries or business scenarios.  This makes it easy to find everything on a particular topic.

It’s also architecturally neutral.  It provides design techniques and best practices without advocating a specific approach to data warehousing.

It provides deep coverage.  It explains best practices and fully explores the reasoning behind them.  It looks at the impact of each technique on BI and ETL processes, and also explores situations where you may wish to deviate from best practices.