Tuesday, March 29, 2011

Q&A: Dimensional vs ER Model for the Data Warehouse

A question about the fundamental data architecture of the data warehouse:

Q: We are embarking on designing a warehouse for our BI efforts.  Someone asked if we should create the warehouse in a dimensional structure, rather than the typical ER structures.

Everything I’ve read and seen has advocated the warehouse be ER modeled at the lowest level of detail appropriate and then for specific mart needs, the data is modeled dimensionally at a grain that is consistent with the needs for that mart, utilizing conformed dimensions.

My question then is, have you seen any instances where a Warehouse (or BI Warehouse) has been designed and built successfully utilizing a non ER structural approach?

A: Your question deals with the primary difference between two popular data architectures -- those associated with W.H. Inmon and Ralph Kimball.

Two Architectures

Normalized ER models are great for capturing data, but very difficult to use for analytic reporting.  That's why Inmon believes that, although the enterprise data warehouse should be normalized, the data marts should be dimensional. This approach is the one you have read about.

But it is also possible to use a dimensional model as an integrated repository of atomic data. That is the Kimball philosophy -- dimensional all the way. This repository may be centralized (physical) or distributed (logical) repository. In either case, conformed dimensions ensure compatibility across subject ares.

I explain both architectures in my latest book, but I don't argue for one over the other.  This was a conscious choice, because I wanted it to be useful to anyone who uses dimensional data. I take the same approach in the classes I teach.

Dimensional Works Just Fine

The answer to the original question: Yes, I have seen many data warehouses that are successful without an ER layer.  

Keeping data in a dimensional format does not harm the ability to integrate data or to capture detail. (In some cases it reduces the amount of work needed to load all the data structures.) It also allows dimensional constructs to guide scope from requirements to completion, without translation back and forth into an ER format. Secondary dimensional structures ("derived schemas" or "second line data marts") can always be added as required for specific reporting needs. 

Other Approaches Work Too

Keep in mind that other approaches can be successful, as I have discussed before. Most of us won't get to make the choice that Brenda is considering.  If you already have an Inmon-style architecture in place, you should not change it simply for philosophical reasons. 

The motto of this blog is: be pragmatic, not dogmatic.

- Chris

Image licensed via Creative Commons 2.0
 from Patrick Hoesley

Book Signing in DC, Date Change for Boston Seminars

Washington DC - Book Signing

I will be signing books next week at the TDWI World Conference in Washington DC.  You will find me in the main exhibit hall on Tuesday 4/5 from 11:30 am to 1:00 pm.

I will also be teaching on Monday and Tuesday, and conducting half-hour one-on-one sessions on Wednesday.

Hope to see you there! 

Boston Seminars - Date Change

The TDWI Boston Seminar will now be taking place June 13-16.  If you have already signed up, you should have received notification from TDWI. 

I will be teaching three courses that week, including an intensive two-day course on advanced dimensional design.  This is a rare public offering for my 2 day course, which is usually only offered through TDWI Onsite Education.

Monday, March 7, 2011

Q&A: Bridges are Part of Dimensional Modeling

Two recent posts on bridge tables generated a lot of questions.  Here is the most common:
When you use a bridge table, isn't that a snowflake? I thought this was not allowed in dimensional modeling.
The use of a bridge is a limited form of snowflaking.  It is also an important feature of the dimensional model.

Bridges are widely accepted, even among those who would otherwise avoid snowflake designs.

A Limited form of Snowflaking

In a Q&A from last year, I suggested that a snowflake occurs when dimensions are joined to something other than a fact table. By this definition, the examples from the previous two posts are certainly snowflakes.

In the first example, a bridge was used to associate a dimension with a repeating attribute (a company with multiple industries.)

In the second example, a bridge was used to associate a fact with multiple dimension rows (a sale with multiple salespeople.)

In both cases, a standard one-to-many relationship between fact and dimension does not exist.1 The bridge table solves this problem.

This usage is widely accepted. In fact, the bridge table is a central feature of dimensional modeling. I devote two full chapters to bridged designs in Star Schema: The Complete Reference.

Other Relationships are Not Modeled

When a bridge table is employed, most dimensional modelers will still refer to the schema as "a star." That's because the bridge is only used in the situations described above -- when the standard master-detail relationship between dimension table and fact table breaks down.

Other kinds of relationships between data elements are not resolved via multiple tables.  In our examples:
  • Repeating attributes are permitted (work phone, home phone, mobile phone) 
  • Dimensions contain partially dependent attributes (brand information present in a product table)
  • Master-detail relationships are not instantiated into separate tables
  • Values associated with codes are not placed in look-up tables
  • Dimensions are not linked to one another
An all-out snowflake design would employ principles of normalization to eliminate these features.

Other Times to Snowflake?

There may be some other situations where limited snowflaking is acceptable.  One is where a set of attributes (e.g. geographic) would otherwise appear in multiple dimension tables. Another is where software tools (the RDBMS or BI software) work better in a snowflake environment. 

These uses are a bit more controversial, and I wrote about them in a previous post.

When it comes to the bridge table, however, there is no controversy.  The bridge is an accepted technique in dimensional modeling, and is often necessary for specific dimensional designs.

- Chris

1 Bridges may also be used to navigate recursive relationships, as I describe in the book.

Image credit: Gravityx9 licensed under Creative Commons 2.0