There are three primary methods for drilling across. These methods can be leveraged manually, automated with BI software, or performed during the data integration process.
Querying multiple stars
Conformed dimensions ensure compatibility of information in various stars and data marts; drilling across is the process of bringing it together. (For a refresher, see this post.)
For example, suppose we wish to compute "Return Rate" by product. Return rate is the ratio of shipments to returns.
Information about shipments and returns is captured in two granular star schemas:
- shipment_facts captures shipment metrics by Salesperson, Customer, Product, Proposal, Contract, Shipment, Shipper, and Shipment Date
- return_facts captures return metrics by Salesperson, Customer, Product, Contract, Reason and Return Date
Drilling across
Recall that fetching facts from more than one fact table requires careful construction of queries. It is not appropriate to join two fact tables together, nor to link them via shared dimensions. Doing so will double-count facts, triple-count them, or worse.
Instead, the process must be completed in two phases.
- Phase 1: Fetch facts from each fact table separately, aggregating them to a common level of detail
- Phase 2: Merge these intermediate result sets together based on their common dimensions
Method 1: Issue two queries, then merge the results
The first method for drilling across completes phase 1 on the database, and phase 2 in the application (or on the application server).
- Construct two separate queries: the sum of quantity shipped by product, and the sum of quantity returned by product.
- Take the two result sets as returned by the DBMS, and merge them based on the common products. Compute the ratio at this point.
Method 2: Build temp tables, then join them
The second method performs both phases on the DBMS, making use of temporary tables.
- Construct two SQL statements that create temporary tables: the sum of quantity shipped by product, and the sum of quantity returned by product
- When these are completed, issue a query that performs a full outer join of these tables on the product names and computes the ratio.
Method 3: Join subqueries
Like the previous method, this method performs all the work on the DBMS. In this case, however, a single query does all the work.
Queries for each fact table are written, then joined together in the FROM clause of a master query. For example:
SELECT
COALESCE (shp.product, rtn.product) as Product,
quantity_returned / quantity_shipped as ReturnRate
FROM
( SELECT product, sum(quantity_shipped)as quantity_shipped
FROM shipment_facts, product
WHERE .....
) shp
FULL OUTER JOIN
( SELECT product, sum(quantity_returned) as quantity_returned
FROM return_facts, product
WHERE....
) rtn
ON
shp.product = rtn.product
The two subqueries in the FROM clause represent phase 1. Phase 2 is represented by the main SELECT query that joins them and computes the ratio.
Applying these techniques
These techniques may applied in a variety of ways:
- Report developers may write their own queries using one of more of these methods
- You may have BI software that can automate drilling across using one or more of these methods
- Drilling across may be performed at ETL time using one of these methods (or an incremental variant)
Learn More
For more information, see the following resources:
- Multiple Stars and Conformed Dimensions (August 15, 2011)
- Conformed Dimensions (November 15, 2011)
Many pages are devoted to this topic in my books. In the latest one, Star Schema: The Complete Reference, the following chapters may be of interest:
- Chapter 5, "Conformed Dimensions" discusses these techniques in greater detail.
- Chapter 14, "Derived Schemas" looks at special considerations when creating derived stars that pre-compute drill-across comparisons.
- Chapter 16, "Design and Business Intelligence", discusses how to work with SQL-generating BI software.
-Chris
Image by Patrick Hosely via Creative Commons 2.0