Friday, January 23, 2009

Q&A: Customers with No Sales in Fact Table

In this post, Chris answer a reader's question about reporting on customers with no sales in the fact table.

New Q&A Feature

I get a lot of questions about dimensional modeling from readers of the blog, my books, and people who attend my classes. I try to answer all of them. I thought it would be interesting to post some of them here, for all to read.

This week, I'm answering a question about reporting on customers with no sales.

In general what is the recommended approach for reporting customers that have no sales in the fact table for MTD or YTD? Or, for that matter, any dimension member where no fact exists for some time period?

- J. Dolan

Usually, something like “customers with no sales for a period” is computed as part of the reporting process. Developers often bristle at this kind of report, because it requires using a correlated subquery. This makes it difficult to do with a SQL-generating BI tool, and can be slow. And if you want to use the list of customers with no sales as a filter for another query, things can get really hairy.

An alternative is to pre-compute this information, storing the results in a table that summarizes reporting periods. This is likely to be a factless fact table, with foreign keys to dimension tables representing the period summarized and the dimension in question--customers in your case. If you require monthly and yearly slices, you will want to multiple summaries. This approach may simplify reporting, but adds complexity to the ETL process.

If current period is the only period of concern, as in your question, then it is also possible to pre-compute the information and store the result directly to the customer dimension table, as a flag. The flag would specify whether the customer has placed an order during the current period. In your case, it would be two flags -- one for month to date and one for year to date.

This kind of attribute is called a behavioral dimension because it takes behavioral activity recorded in the fact table and transforms it into a dimension. Behavioral dimensions are very powerful, because they can be used as simple filters for other queries.

Again, this concept trades ETL complexity for reporting flexibility and performance. But isn't that what data warehousing is all about?

Thanks to J. Dolan for agreeing to have his question posted. If you have a question, send it in. I try to answer all my email, though it sometimes takes a while!