Tuesday, July 5, 2011

Dimensional Modelers Do Not Focus on Logical vs. Physical

The separation of logical and physical models is useful entity-relationship modeling. Not so in the world of dimensional models, where final table structures are strikingly similar to business models. 

Instead, a dimensional model is best described in three levels of increasing detail.

Entity-Relationship Modeling and Dimensional Modeling

It is well known that students of dimensional modeling have an easier time if they have not had significant exposure to entity-relationship modeling (or ER modeling).

Entity-relationship modelers are tempted to normalize data, design relationships between dimension tables, and so forth. These steps make perfect sense in the world of OLTP systems design, but not for analytic solutions.

Similarly, when scholars of entity-relationship modeling turn their focus to the dimensional world, another kind of mistake can occur.

Given that it is "all modeling," one may be tempted to generalize a generic framework from ER modeling, and then apply it to dimensional modeling. This mistake is often made by software vendors, and sometimes by academics.

ER modeling and dimensional modeling are both forms of data modeling, but it is not necessary to have a single framework to explain both.

Different Disciplines, Different Frameworks

The traditional distinction between logical and physical model is useful in ER modeling, but not in dimensional modeling. In the world of ER modeling, a logical model captures data requirements without assuming particular data storage technologies. It captures business entities and their attributes and relationships.

The logical model is the basis for a physical model, with reflects the requirements of relational storage in tables and columns. For example, many-to-many relationships are resolved, data may be restructured for performance reasons, etc.

This distinction has significantly reduced utility in the dimensional world. A physical dimensional model stores data in a format that is very similar to how it is understood by the business. Aside from the addition of keys, there is very little difference between logical and physical.

The distinction is not useless, but it plays a much smaller part.  So small, in fact, that you will not find it the focus of most books on dimensional modeling. My own book on dimensional modeling, for example, makes first mention of the term "logical model" in the last two paragraphs of the last chapter.  (See the end of this post for a link.)

Dimensional Models: Three Levels of Detail

Instead of having multiple kinds of model, a dimensional model is best understood at three levels of increasing detail.  Like zooming in on a Google map, each reveals additional information about the model. Each level of detail has different uses for different audiences.
  1. Business Requirements
    Requirements are grouped by subject area, correspond to business process, state measurement requirements in terms of facts and dimensions, and cross reference common dimensions.

    These business requirements clearly convey scope in business terms.  They link directly to the next level of detail in the model, which exposes the concept of table.

  2. High Level Design
    This level of the model defines the same requirements in terms of fact tables and dimension tables, natural keys an surrogate keys, and exposes major attributes of significance.

    At this level of detail, we do not record every column of every table, or even assign data types.  But we do draw table diagrams, and rigorously define several important design elements such as grain, additivity, and slow change requirements.

    This level is useful for design reviews, educating users and developers, and describing project activities.

  3. Detailed Design
    At the lowest level of detail, we expose every column of every table, define data types, provide definitions and sample data, map everything back to source data, and document transformation rules.

    This level of detail is useful for database administrators and ETL architects. It also contains metadata that will be useful for BI developers and end-users.

One Model, Three Levels of Detail

Don't make the assumption that these three levels of the dimensional model are developed in sequence. It is possible to do so, but not always the case.

Most dimensional designers do the bulk of their work at the second level - the high level dimensional design.

As the model nears completion, they will summarize it at a business level, and then develop the detailed model. In other cases, modelers may work through these levels in sequence.

Regardless of the process, in the end,  these three levels all describe the same dimensional model. Like levels of a Google map, they do so at different levels of detail. All should be accounted for.

For more information on what information to capture at each of these levels, see Star Schema: The Complete Reference.  Detailed descriptions and examples can be found in Chapter 18, "How To Design And Document A Dimensional Model."

Image licensed via Creative Commons 2.0
 from Patrick Hoesley