Monday, July 16, 2012

Q&A: Does Master-Detail Require Multiple Stars in a Travel Data Mart

A reader asks whether master-detail relationships call for multiple fact tables.

Q: Is there some general principle to use as a guide when deciding to use 1 star or 2 for master-detail situations?

We have a master-detail situation involving trips, and multiple trip expenses per trip. Many of our queries are at the trip level - eg. how many trips are for clients in their 30's going to Arizona in July. Many of the queries are at the detail level -- how much is spent on means by the client's age range?

Both levels of query can be answered with one star, with the fact at the detail level (trip expense line). For trip-level questions - you can count distinct occurrences of the natural key in the trip dimension. But it seems that grinding through all those trip expenses when I just wanted to count trips is inefficient.

So do we have 2 fact tables, one at the trip level, one at the trip expense level? Do the dimensions surround both of them?  Is there a rule?

- Ellen, Ottawa
A: As the reader observes, this is a common situation: when there is a master-detail relationship, is there a need for multiple fact tables?

This case calls for at least two fact tables.
Read on for the details.

Photo by o5com, licensed by Creative Commons

Multiple Fact Tables

After learning the basics of dimensional modeling, the first real world challenge we face is understanding when and how to design multiple fact tables.  Until we learn to think in dimensional terms, the choice can be difficult.

I suggest starting with some basic guidelines.  You probably need different fact tables if:
  1. You have measurements with different periodicity
  2. You have measurements with different levels of detail
The first guideline suggests that if facts do not describe the same event, they probably belong in different fact tables. For example, orders and shipments do not always happen at the same time.  Order dollars and shipment dollars belong in separate fact tables.

The second guideline pertains to facts that do describe the same events.  Information about an order and information about an order line are ostensibly available at the same time, but they have different levels of detail. If there are facts at both of these levels, there will need to be multiple fact tables.

So how would you apply these guidelines?

Think Dimensionally

It is easiest to work these questions out by thinking dimensionally. Forget about master-detail or parent-child for a few minutes, and consider what is being measured. 
  • What are the business questions being asked, and what business metrics do they contain?  
  • Are these metrics available simultaneously?
  • At what level of detail are these metrics available?
The reader cited business questions in her example.  These questions reveal at least two metrics:  number of trips, and expense dollars.

These metrics may be available on different schedules -- for example, a trip may commence before we have the details of all the expenses incurred.  This would argue for multiple fact tables.

But let's suppose that the single source of data is expense reports. We do not have any information about the trip until it is over, at which point we have all the expense details.

In this case, lets think about the level of detail of these metrics. Number of trips and expense dollars seem to share a lot of common detail -- the traveller, the destination, the trip start and end date, and so forth.  

But expense dollars have some additional detail -- the date of the expense item, and an expense category. Since these facts have different levels of detail, it makes sense to include them in separate fact tables.

Factless Fact Table

The trip-level star may contain a factless fact table.  It contains one row per trip, with no explicit facts.  We can determine the number of trips that answer a business question simply by counting rows.

Many teams would prefer to add a fact called number_of_trips and always populate it with the value 1.  This is useful, because it makes the business metric explicit in your schema design.  (I've written about this technique before.)

The trip level star may also contain some summary level facts that describe expenses -- say the total trip cost, the total transportation cost, total meals cost, etc.  More detail on these metrics (such as the vendor, payment method, etc.) can found in the expense item star.

Deeper into Trip Analysis

Digging deeper, the reader may discover she needs more than two fact tables.  

Trips often include more than one destination.  When it is necessary to study the various segments (or "legs") of a trip, we can define a third fact table that contains one row per segment.

In this case, the reader would have three fact tables:
  1. trip_facts  Grain: one row per trip.  Metrics: Number of trips (always one)
  2. trip_segment_facts  Grain: one row per destination city.  Metrics: number of segments (always one)
  3. trip_expense_facts  Grain: one row per expense item.  Metrics: expense_dollars
Each of these stars will share some common dimensions -- the trip origination date, the trip end date, the traveller, the trip identifier, etc.

Trip_segment_facts will also include destination-specific dimensions: the destination city of the segment, the arrival and departure dates for the segment, etc.

Trip_expense_facts will include dimensions that describe the individual expense items: the date paid, the payment method, the payee, the expense category and subcategory, and so forth.

A conformance matrix will help you keep track of the dimensionality of each star.

Learn More

Thanks to Ellen for the thoughtful question.  If you have a question of your own, please send it in.

Pick up a copy of Star Schema: The Complete Reference and you will be helping support this blog. 

  • Chapter 4 is dedicated to designing multiple fact tables.  
  • Chapter 5 looks closely at conformed dimensions. 
  • Chapter 12 covers faceless fact tables.
You can learn more about these topics by referring to these posts:
  • Factless Fact Tables (September 15, 2011) describes faceless fact tables that describe events, and the use of a fact with the constant value of 1.
  • The Conformance Matrix (June 5, 2012) describes how to represent the dimensionality of your stars in a matrix format.

The photo in this post is by o5com, licensed by Creative Commons