Tuesday, March 16, 2010

Storing Nonadditive Facts [Q&A]

It can be challenging to decide what to do with a nonadditive fact. A reader asks if (and when) it might make sense to store a percentage in a fact table:
Chris,

What are your thoughts on storing percentages in fact tables?  Many of the calculations seem too complex to do in the BI tool. For instance, we calculate life-to-date percentages of KPIs on products.

Thanks,
Adrienne
Burlington, NC
Percentages are (usually) non-additive facts.  What this means is that you cannot add them together.  For example, if a computer is sold at 5% margin, and then another is sold at 5% margin, it is a mistake to say that total margin for the two sales is 10%. 

In introductory texts or courses, you are taught to decompose them into fully additive components.  The idea is that you can sum all the components, then compute the ratio at the last moment. The margin example above is a good application. Storing cost and sales price allows margin to be computed at any level of aggregation.

However, this only really works when all the components for a particular ratio are stored in the same row, and used within the same scope of aggregation.   As you have discovered, there are some non-additive facts that do not work this way.  A ratio that compares something to a larger pool or category is a good example.  Two or more aggregations are needed to compute this kind of metric.

It also sounds as if your problem is compounded by the need to use some kind of running-totals in the computation, which represent the lifetime-to-date sums.  These are also problematic because they are also non-additive, or possibly semi-additive.

Non-additive measurements that are computed from different aggregations are often stored in fact tables for exactly the reason you give: they are difficult to automate in a BI tool. They are also precomputed in situations where they would otherwise harm performance.

If you decide to pre-compute and store a non-additive fact, there are usually two questions that follow:  Where to store it, and how to use it.  The issues here are very similar to those surrounding period-to-date measurements, as I discussed in a post last year. 

Where to Store a Nonadditive Fact

Figuring out the answer to this question is closely linked to how it will be used.  If it is needed at a transaction-level, it must go in a transaction-grained fact table.

However, if it also need to be known at a daily or monthly level, that solution is not enough.  Since it is non-additive, it must be computed separately for these aggregations.  A periodic snapshot makes sense here.  Consider forgoing the transaction-level version, unless it is needed as well.

Also note that "time" is not the only dimension across which we might need a summarized non-additive fact.  For example, you might also want to see it across product categories, customer categories, and so forth.  Each would require its own snapshot-style fact table for discrete storage of the pre-computed fact.

In fact, this is the biggest problem with storing non-additive facts:  each potential aggregation must be stored separately.

Nonadditive Facts and BI Software

Because they are nonadditive,  (and may involve constituents that are semi-additive), these kind of facts must be used carefully.  For example, they should never be summed, subtotaled, etc.  Similarly, if we need to show them at different levels of detail, we need to fetch each aggregation separately.

This is relatively straight forward if the schema is well thought out, well documented, and fully understood by report developers.

However, it is very hard to make this kind of fact available in a "Self-Service" mode for end users, since most BI tools will easily break both rules.  (BI tools can usually handle simple non-additive facts, like the margin example at the beginning, but not facts that are computed from multiple aggregations.)

Alternatives may include:
  • Hiding the non-additive fact from users while exposing it to developers
  • Classifying it as a dimension so that it is not aggregated,  or 
  • Blocking certain features of the tool.
The fact that a non-additive fact may be stored in multiple locations (corresponding to different levels of aggregation) may also befuddle your BI software.
  • It may be possible to make use of  "aggregate navigation" features to teach it to fetch a fact from more than one possible location, but you will need to make sure that the tool always gets it right.  In addition, using such a feature may prevent you from "turning off" the tool's ability to aggregate a fact.  
  • Separate semantic layers for developers vs. users may be the solution to these problems.
Again, these considerations are less important in cases where trained developers do all the report building.  It is only when leveraging SQL-generating BI tools that you need to worry.

Another Option

In some cases, non-additive facts can be linked to a specific member of a dimension, such as an individual customer or  product.  In this case, consider storing it in the dimension table, as a behavioral dimension.  This is most effective when the measurement in question can be refreshed on a pre-defined schedule.

Thanks to Adrienne for her question, and for consenting to have it reprinted here!

- Chris

Do you have a question about dimensional design or data warehousing?  Send it in.  I answer all my email, though sometimes it takes a while.

Image is by Patrick Hosely licensed under Creative Commons 2.0