Tuesday, March 29, 2011

Q&A: Dimensional vs ER Model for the Data Warehouse

A question about the fundamental data architecture of the data warehouse:

Q: We are embarking on designing a warehouse for our BI efforts.  Someone asked if we should create the warehouse in a dimensional structure, rather than the typical ER structures.

Everything I’ve read and seen has advocated the warehouse be ER modeled at the lowest level of detail appropriate and then for specific mart needs, the data is modeled dimensionally at a grain that is consistent with the needs for that mart, utilizing conformed dimensions.

My question then is, have you seen any instances where a Warehouse (or BI Warehouse) has been designed and built successfully utilizing a non ER structural approach?

A: Your question deals with the primary difference between two popular data architectures -- those associated with W.H. Inmon and Ralph Kimball.

Two Architectures

Normalized ER models are great for capturing data, but very difficult to use for analytic reporting.  That's why Inmon believes that, although the enterprise data warehouse should be normalized, the data marts should be dimensional. This approach is the one you have read about.

But it is also possible to use a dimensional model as an integrated repository of atomic data. That is the Kimball philosophy -- dimensional all the way. This repository may be centralized (physical) or distributed (logical) repository. In either case, conformed dimensions ensure compatibility across subject ares.

I explain both architectures in my latest book, but I don't argue for one over the other.  This was a conscious choice, because I wanted it to be useful to anyone who uses dimensional data. I take the same approach in the classes I teach.

Dimensional Works Just Fine

The answer to the original question: Yes, I have seen many data warehouses that are successful without an ER layer.  

Keeping data in a dimensional format does not harm the ability to integrate data or to capture detail. (In some cases it reduces the amount of work needed to load all the data structures.) It also allows dimensional constructs to guide scope from requirements to completion, without translation back and forth into an ER format. Secondary dimensional structures ("derived schemas" or "second line data marts") can always be added as required for specific reporting needs. 

Other Approaches Work Too

Keep in mind that other approaches can be successful, as I have discussed before. Most of us won't get to make the choice that Brenda is considering.  If you already have an Inmon-style architecture in place, you should not change it simply for philosophical reasons. 

The motto of this blog is: be pragmatic, not dogmatic.

- Chris

Image licensed via Creative Commons 2.0
 from Patrick Hoesley