Chris,What are your thoughts on storing percentages in fact tables? Many of the calculations seem too complex to do in the BI tool. For instance, we calculate life-to-date percentages of KPIs on products.Thanks,
In introductory texts or courses, you are taught to decompose them into fully additive components. The idea is that you can sum all the components, then compute the ratio at the last moment. The margin example above is a good application. Storing cost and sales price allows margin to be computed at any level of aggregation.
However, this only really works when all the components for a particular ratio are stored in the same row, and used within the same scope of aggregation. As you have discovered, there are some non-additive facts that do not work this way. A ratio that compares something to a larger pool or category is a good example. Two or more aggregations are needed to compute this kind of metric.
It also sounds as if your problem is compounded by the need to use some kind of running-totals in the computation, which represent the lifetime-to-date sums. These are also problematic because they are also non-additive, or possibly semi-additive.
Non-additive measurements that are computed from different aggregations are often stored in fact tables for exactly the reason you give: they are difficult to automate in a BI tool. They are also precomputed in situations where they would otherwise harm performance.
If you decide to pre-compute and store a non-additive fact, there are usually two questions that follow: Where to store it, and how to use it. The issues here are very similar to those surrounding period-to-date measurements, as I discussed in a post last year.
Where to Store a Nonadditive Fact
Figuring out the answer to this question is closely linked to how it will be used. If it is needed at a transaction-level, it must go in a transaction-grained fact table.
However, if it also need to be known at a daily or monthly level, that solution is not enough. Since it is non-additive, it must be computed separately for these aggregations. A periodic snapshot makes sense here. Consider forgoing the transaction-level version, unless it is needed as well.
Also note that "time" is not the only dimension across which we might need a summarized non-additive fact. For example, you might also want to see it across product categories, customer categories, and so forth. Each would require its own snapshot-style fact table for discrete storage of the pre-computed fact.
In fact, this is the biggest problem with storing non-additive facts: each potential aggregation must be stored separately.
Nonadditive Facts and BI Software
Because they are nonadditive, (and may involve constituents that are semi-additive), these kind of facts must be used carefully. For example, they should never be summed, subtotaled, etc. Similarly, if we need to show them at different levels of detail, we need to fetch each aggregation separately.
This is relatively straight forward if the schema is well thought out, well documented, and fully understood by report developers.
However, it is very hard to make this kind of fact available in a "Self-Service" mode for end users, since most BI tools will easily break both rules. (BI tools can usually handle simple non-additive facts, like the margin example at the beginning, but not facts that are computed from multiple aggregations.)
Alternatives may include:
- Hiding the non-additive fact from users while exposing it to developers
- Classifying it as a dimension so that it is not aggregated, or
- Blocking certain features of the tool.
- It may be possible to make use of "aggregate navigation" features to teach it to fetch a fact from more than one possible location, but you will need to make sure that the tool always gets it right. In addition, using such a feature may prevent you from "turning off" the tool's ability to aggregate a fact.
- Separate semantic layers for developers vs. users may be the solution to these problems.
In some cases, non-additive facts can be linked to a specific member of a dimension, such as an individual customer or product. In this case, consider storing it in the dimension table, as a behavioral dimension. This is most effective when the measurement in question can be refreshed on a pre-defined schedule.
Thanks to Adrienne for her question, and for consenting to have it reprinted here!
Do you have a question about dimensional design or data warehousing? Send it in. I answer all my email, though sometimes it takes a while.