Chris Adamson’s Blog: Three ways to drill across

Friday, December 23, 2011

Three ways to drill across

Previous posts have explored the importance of conformed dimensions and drilling across. This post looks at the process of drilling across in more detail.

There are three primary methods for drilling across. These methods can be leveraged manually, automated with BI software, or performed during the data integration process.

Querying multiple stars

Conformed dimensions ensure compatibility of information in various stars and data marts; drilling across is the process of bringing it together. (For a refresher, see this post.)

For example, suppose we wish to compute "Return Rate" by product. Return rate is the ratio of shipments to returns.

Information about shipments and returns is captured in two granular star schemas:

shipment_facts captures shipment metrics by Salesperson, Customer, Product, Proposal, Contract, Shipment, Shipper, and Shipment Date
return_facts captures return metrics by Salesperson, Customer, Product, Contract, Reason and Return Date

Reporting on return rate will require comparing facts from each of these stars.

Drilling across

Recall that fetching facts from more than one fact table requires careful construction of queries. It is not appropriate to join two fact tables together, nor to link them via shared dimensions. Doing so will double-count facts, triple-count them, or worse.

Instead, the process must be completed in two phases.

Phase 1: Fetch facts from each fact table separately, aggregating them to a common level of detail
Phase 2: Merge these intermediate result sets together based on their common dimensions

In practice, there are several ways that this task can be performed.

Method 1: Issue two queries, then merge the results

The first method for drilling across completes phase 1 on the database, and phase 2 in the application (or on the application server).

Construct two separate queries: the sum of quantity shipped by product, and the sum of quantity returned by product.
Take the two result sets as returned by the DBMS, and merge them based on the common products. Compute the ratio at this point.

While it may seem odd that phase 2 not be performed on the DBMS, note that if the data sets are already sorted, this step is trivial.

Method 2: Build temp tables, then join them

The second method performs both phases on the DBMS, making use of temporary tables.

Construct two SQL statements that create temporary tables: the sum of quantity shipped by product, and the sum of quantity returned by product
When these are completed, issue a query that performs a full outer join of these tables on the product names and computes the ratio.

Be sure that the temporary tables are cleaned up.

Method 3: Join subqueries

Like the previous method, this method performs all the work on the DBMS. In this case, however, a single query does all the work.

Queries for each fact table are written, then joined together in the FROM clause of a master query. For example:

SELECT
COALESCE (shp.product, rtn.product) as Product,
quantity_returned / quantity_shipped as ReturnRate
FROM
( SELECT product, sum(quantity_shipped)as quantity_shipped
    FROM shipment_facts, product
    WHERE .....
) shp
FULL OUTER JOIN
( SELECT product, sum(quantity_returned) as quantity_returned
    FROM return_facts, product
    WHERE....
) rtn
ON
    shp.product = rtn.product

The two subqueries in the FROM clause represent phase 1. Phase 2 is represented by the main SELECT query that joins them and computes the ratio.

Applying these techniques

These techniques may applied in a variety of ways:

Report developers may write their own queries using one of more of these methods
You may have BI software that can automate drilling across using one or more of these methods
Drilling across may be performed at ETL time using one of these methods (or an incremental variant)

In the latter case, the ETL process builds a new star (or cube) that contains the result of drilling across. This is called a derived schema, or second line data mart.

Learn More

For more information, see the following resources:

Multiple Stars and Conformed Dimensions (August 15, 2011)

Conformed Dimensions (November 15, 2011)

Many pages are devoted to this topic in my books. In the latest one, Star Schema: The Complete Reference, the following chapters may be of interest:

Chapter 5, "Conformed Dimensions" discusses these techniques in greater detail.

Chapter 14, "Derived Schemas" looks at special considerations when creating derived stars that pre-compute drill-across comparisons.

Chapter 16, "Design and Business Intelligence", discusses how to work with SQL-generating BI software.

More to come on this topic in the future. If you have questions, send them in.

-Chris

Image by Patrick Hosely via Creative Commons 2.0

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software