Monday, August 15, 2011

Multiple Stars and Conformed Dimensions

The concept of conformed dimensions is central to the discipline of dimensional modeling. This post introduces conformed dimensions; future posts will explore them in more detail. 

In your data warehouse, each star corresponds to a business process. Combining facts from different processes can produce powerful compound metrics. The key to making this work is a set of conformed dimensions.

Conformed dimensions are closely associated with Kimball's "Bus Architecture," but are crucial in any scenario that involves dimensional data.

Multiple Stars

In a dimensional design, each star captures collects measurements that describe a discrete business process.

If we have two measurements that describe different processes, we place them into separate fact tables.

For example, a sales data mart may contain multiple stars:
  • Proposal information by Salesperson, Prospect, Product, Proposal and Proposal Date
  • Order information by Salesperson, Customer, Product, Proposal, Contract and Order Date
  • Shipping information by Salesperson, Customer, Product, Proposal, Contract, Shipment, Shipper, and Shipment Date
  • Return information by Salesperson, Customer, Product, Contract, Reason and Return Date
In each of these stars, the fact table will record measurements that describe the processes of issuing proposals, taking orders, shipping product and handling returns.

By recording measurements of each process in a different star, we are able to capture information at the most detailed level possible.  We can study each of these processes, complete with attendant details, by accessing the appropriate star.

Cross-process Metrics

Some of the most powerful measurements actually combine information from multiple processes. These metrics require combining facts from different stars.

In the sales data mart, the ratio of proposals to orders is the "close rate," a powerful indicator that sales managers and executives look at on a regular basis.

Similarly, the ratio of shipments to returns is the "return rate," an essential quality control metric.

Drilling Across

When we compare facts from different stars, we don't simply join the fact tables.  To do so might cause double-counting of some facts.

Instead we follow a process that Kimball calls drilling across.  The drill-across process can be broken down into two phases.  In the first phase, each star is queried, and results are aggregated to a common level of detail.  In the second phase, these result sets are merged based on their common dimensions.

For example, to compute the return rate by product for August of 2011, we do the following:
  1. a. Figure out quantity shipped by product for August 2011
    b. Figure out quantity returned by product for August 2011
  2. Merge these amounts based on the common product names and compute the ratio
We may perform this drill across operation at query time (many BI tools can do this automatically), or we may do it at ETL time, storing the results in a separate star or cube (sometimes called a second-line data mart.) 

Conformed Dimensions

The key to making all this work is the organization of the dimensions.  As you saw in the example above, we used the dimension values to link our results together: product names were used to merge together shipment and return quantities and compute their ratio.  This would not have worked if the product dimensions for Shipments and Returns had been different.

This is the basic idea behind conformed dimensions.  We say that two dimensions conform if they have the same structure and content.  Both our stars, had a product dimension table with a product name attribute, and the product names were specified the same way in each.  Sharing a single physical table is one way to ensure conformance, but it is not required.

Two dimensions can also conform if one has a subset of the other's attributes.  As long as the common attributes have the same structure and content, they are said to conform.

Planning Conformance

By planning a set of conformed dimensions, we ensure that fact tables can be used to compare processes.  This is important within a single data mart, such as the one above, and it is also important when looking across multiple data marts.

Conformed dimensions are the organizing principle in Kimball's architecture.  Conformed dimensions are planned up-front, as a part of a project that establishes a data architecture based on dimensional design. Implementation proceeds once this conformance bus has been planned.

The concept is also important in other architectures. For example, the sales data mart discussed above might be part of Corporate Information Factory architecture.  Within this data mart, conformance guarantees we can compare shipments to returns, proposals to orders, and so forth.

More to come

In the coming weeks, I will post more about conformed dimensions.  We will look at "levels" of conformance, how to document conformed dimensions, and how different tools work with conformed dimensions. If you have my book, I also encourage you to read Chapters 4 and 5, which look at these concepts in detail.

Photo by Agnes Periapse, licensed under Creative Commons 2.0