Wednesday, February 9, 2011

Bridge to Multi-Valued Dimensions

Here's what you can do when a dimension table and fact are not in a one-to-many relationship.

A recent post described how a bridge can be used to resolve repeating attributes in a dimension. Today's post looks at a second use:  facts that must link to multiple rows in a dimension.

Facts with Multi-Valued Dimensions

In most star schemas, each fact table row references exactly one row in each associated dimension. This relationship between dimension table and fact table is variously called "one-to-many", "master-detail" or "parent-child". For example, in a star schema that measures the order-taking process, each row in the fact table will link to exactly one row in the product table, one row in the salesperson table, etc.

But what happens when more than one salesperson can be involved in an order? Suppose there could be two, three, or any number of salespeople? The one-to-many relationship breaks down.

Avoiding the Bridge

As with the repeating attribute from last time, we might try to accommodate this multi-valued dimension by simplifying the relationship. Perhaps we can isolate some "roles" that can be used to recover a clean one-to-many relationship. If there is always a lead salesperson and a secondary salesperson, we can simply place 2 salesperson keys in the fact table.

This has some disadvantages, though, and they are the same as those from last time.

First, it becomes difficult to produce a report showing all sales that a particular person was involved in. The query needs to select sales where the primary salesperson is the person in question, OR where the secondary salesperson is the person in question. Doing this for all salespeople in a department becomes tricky, especially if we are using a SQL-generating BI tool.

The second problem with simplifying the relationships is that there may be a need for any number of dimension values. Suppose 3 sales people collaborate on an order? Do we provide 3 salesperson keys in the fact table? This compounds the first problem, and it will only be a matter of time before we come across a case where we need 4.

A Dimension Bridge

The solution is to build a table that bridges the fact table and dimensions.

When a fact references a group of dimension values, a group will be created. The fact table will contain the group key. This key will join to the bridge table.  The bridge will have a row for each group member, linking to the dimension. An example appears here:


This simple solution allows individual rows in the fact table to link to any number of salespeople.  Notice that it requires a group to be established even for orders that have only one salesperson.

Using the Bridge

With a dimension bridge in place, it is easy filter facts for single dimension member, or group them by dimension members.  The sum of sales, for example, can be grouped by salesperson: the fact table joins to the bridge, the bridge joins to the dimension, and a simple group by query is issued.

Dimension bridges are used in many other situations. Multiple citations during an inspection, multiple parties in a dispute or claim, and so forth.

Dangers

As with the attribute bridge, the dimension bridge brings with it the possibility of misuse.

Without careful attention, facts may be double-counted, triple-counted or worse.  For example, if two people collaborate on a sale, a report of sale by salesperson will show the sale twice.  This kind of report is useful, but creating a grand total would be a mistake. Similar issues crop up if both people are in the same department, and someone tries to produce a report showing sales by department.

The solution is to make sure the bridge is only accessible to trained developers and analysts.  A simplified solution can be made available to ad hoc users. In this case, that might mean adding a primary salesperson key to the fact table. This allows ad hoc users to safely join the fact table and dimension table, but their analytic capability will be limited.

More On Dimension Bridges

This introduction has only scratched the surface of dimension bridges. As with the previous post, there are many more refinements and complications that can be introduced. These include the impact on ETL, the addition of allocation factors, coping with software that does not permit a many-to-many relationship and more.

These topics will be explored in future posts. You can also read all about bridge tables in Chatpers 9 and 10 of Star Schema: The Complete Reference.

Have a question about bridge tables? Send it to the address in the sidebar, and it may be addressed in a Q&A post.

Image by Branwebs via Creative Commons