Chris Adamson’s Blog: December 2012

Thursday, December 20, 2012

Dimensional Models: No E-R Fallback Required

People often suggest to me that a dimensional model cannot handle certain situations. These situations, the assertion holds, require falling back to entity-relationship modeling techniques.

Image by Patrick Hosley,
Licensed by Creative Commons 2.0

When we look together at their examples, though, I've always found a dimensional solution.

Here's a list of things people sometimes do not realize can be handled by standard dimensional techniques.

At the end of the post is a large collection of links to previous posts treat these topcis in more detail.

Note: This article should not be interpreted as a criticism of entity-relationship modeling. Rather, the point is to cover commonly overlooked capabilities of dimensional modeling.

Predictive Analytics: Storage of Granular Detail

I often hear people say: if you want to do predictive analytics, its important to capture all the relationships among data attributes. This is incorrect.

The job of predictive analytics is to tell us what is important--not the reverse! Neither data mining nor predictive analytics requires information to be stored in an entity-relationship format.

As people who do this work will tell you, they don't care how the data is supplied. Nor do they care what modelers believe are the important relationships. Many tools will completely de-normalize the data before doing anything else.

In order to support predictive analytics, standard best practices of dimensional modeling apply:

Fact tables record business activities and conditions at the lowest level of detail possible in granular, first line stars.
Transaction identifiers are carried in the model if available.
Surrounding dimensions are enriched with detail to the fullest extent possible
In the dimension tables, changes are tracked and time-stamped (“type 2”).

If we do these things with our dimensional data, we have not lost anything that is that is required to do predictive analytics.

Many-to-many Relationships: Multi-Valued Dimension Bridge

Usually, each fact in a star schema can be linked to a single member of a dimension. This is often mistaken for a rule, which leads to the incorrect conclusion that dimensional models cannot accommodate a fact that may link to multiple members of a dimension.

(This is often generalized as “dimensional cannot handle many-to-many relationships.”)

In a dimensional model, this situation is referred to as a multi-valued dimension. We use a bridge table to link the facts to the dimension in question. This special table sets up a many-to-many relationship between the facts and dimensions, allowing any number of dimension members to be associated with any number of facts.

Here are some examples:

In sales, a bridge table may be used to associate a sale, as recorded in a fact table, with multiple sales people.
In insurance, a single claim (again, tracked in a fact table) can be associated with multiple parties.
In healthcare, a single encounter may be associated with multiple providers or tests.
In government, a single audit or inspection may be associated with multiple findings.
In law enforcement, a single arrest may be associated with multiple charges.

Repeating Attributes: Multi-Values Attribute Bridge

It is sometimes suggested that dimensional models cannot gracefully accommodate repeating attributes. The dimensional solution is again a bridge table. This time, it is placed between the dimension table and an outrigger that contains the attribute in question.

Examples include:

Companies that have multiple Standard Industry Classification codes
People with multiple phone numbers or addresses
Accounts with multiple account holders
Patients with multiple diagnoses
Documents with multiple keywords.

Recursive Hierarchies: Hierarchy Bridge

Ragged hierarchies, unbalanced hierarchies, or recursive hierarchies are often cited as the downfall of the dimensional model. In fact, a solution exists, and it is extraordinarily powerful. The hierarchy bridge table allows facts to be aggregated either by rolling up or rolling down through the hierarchy, regardless of number of levels.

Examples include:

Parts that are made up of other parts
Departments that fall within other departments
Geographies that fall within other geographies
Companies that own other companies

Relationships Between Dimensions: Factless Fact Tables

A star schema does not include relationships between dimension tables. This has led to the misconception that you can't track these relationships.

In fact, any important relationship between dimension tables can and should be captured. It is done using factless fact tables. (Dimensions are never directly linked because of the implications this would have on slow change processing.)

Examples include:

Employees filling a job in a department
Marketing promotions in effect in a geographical market
Students registered for courses
The primary care physician assigned to an insured party
Brokers assigned to clients

Subtyping: Core and Custom

Another situation where modelers often believe they must “fall back” on ER techniques is when the attributes of a dimension table vary by type. This variation is often misconstrued as calling for the ER construct known as subtyping. Similar variation might also be found with associated facts

In the dimensional model, heterogeneous attributes are handled via the core and custom technique.

A core dimension captures common attributes, and type specific replicas capture the core attributes along with those specific to the subtype. Dimensions can then be joined to the fact table according to the analytic requirement. If there is variation in the facts, the same can be done with fact tables.

Examples include:

Products with attributes that vary by type
Customers that have different characteristics depending on whether they are businesses or individuals
In insurance, policies that have different characteristics depending on whether they are group or individual polices
In healthcare, medical procedures or tests that have different characteristics and result metrics depending on the test
In retail, stores that have varying characteristics depending on the type (e.g. owned, franchise, pocket)

Non Additive Metrics: Snapshots or Summary Tables

In a classic star schema design, facts are recorded at a granular level and “rolled up” across various dimensions at query time. Detractors often assume this means that non-additive facts have no place in a dimensional model.

In fact, dimensional modelers have several tools for handling non-additive facts.

Those that can be broken down into additive components are captured at the component level. This is common for many key business metrics such as:

Margin rate: stored as margin amount and cost amount in a single fact table. The ratio is computed after queries aggregate the detail.
Yield or conversion percentage: stored as quote count and order count in two separate fact tables, with ratio converted after aggregation at query time.

Other non-additive facts cannot be broken down into fully additive components. These are usually captured at the appropriate level of detail, and stored in snapshot tables or summary tables. Common examples include:

Period-to-date amounts, stored in a snapshot table or summary table
Distinct counts, stored in a summary table
Non-numeric grades, stored in a transaction-grained fact table

While these non-additive facts are flexible than additive facts in terms of how they can be used, this is not a result of dimensional representation.

Conclusion

Every technique mentioned here is part of dimensional modeling cannon. None are stopgaps or workarounds. While some may prove problematic for some of our BI software, these problems are not unique to the dimensional world.

In the end, the dimensional model can represent the same real-world complexities that entity-relationship models can. No ER fallback required.

- Chris

Learn More

All of these topics have been covered previously on this blog. Here are some links to get you started.

Establishing granular, detailed star schemas:

Build High Resolution Stars (January 7, 2011).

Multi-valued Dimensions:

Bridge to Multi-Valued Dimensions (February 9, 2011)
Bridge Tables and Many-to-many Relationships (April 25, 2011)
See the label bridges for more posts that deal with this topic.

Multi-Valued Attributes:

Resolve Repeating Attributes with a Bridge Table (January 28, 2011)
Q and A: Bridges are Part of Dimensional Modeling (March 7, 2011)
See the label bridges for more

Recursive Hierarchies:

Recursive Hierarchies and Bridge Tables (May 22, 2012)
Using a Hierarchy Bridge Without a Many-to-many Relationship (May 18, 2012)
And again, more will appear under the label bridges

Factless Fact Tables:

Factless Fact Tables (September 15, 2011)
See also the label Factless Fact Tables

Core and Custom:

There's not much on this topic on this blog right now, but see my book for more info (details below.)

Non-additive Facts:

Handling Rankings (October 10, 2012)
Storing Non-additive Facts (March 16, 2010)
More on Distinct Counts (April 27, 2009)
Dealing with Period-to-date Measurements (April 18, 2009)
See the category non-additve facts for more

Also consider checking out my book, Star Schema: The Complete Reference. It covers all these topics in much more detail:

Granular and detailed stars are covered in Chapter 3, "Stars and Cubes"
Multi-valued Dimensions and Multi-Valued attributes are covered in Chapter 9, "Multi-valued Dimensions and Bridges"
Hierarchy Bridges are covered in Chapter 10, "Recursive Hierarchies and Bridge Tables"
Factless Fact Tables are covered in Chapter 12, "Factless Fact Tables"
Core and Custom schemas are covered in Chapter 13, "Type-specific Stars"
Non Additive Facts are covered in Chapter 3, "Stars and Cubes," Chapter 11, "Transactions, Snapshots and Accumulating Snapshots" and Chapter 14, "Derived Schemas"

Use the links on this blog to order a copy from Amazon. There is no additional cost to you, and you will be helping support this blog.

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software