Monday, April 25, 2011

Bridge Tables and Many-to-many Relationships

Recent posts about bridge tables generated a lot of interest. Today, I will look at the many-to-many relationships in these solutions. 

These relationships are intended, necessary, and valid parts of database design. Many software tools do not understand this kind of relationship, so I'll also show you how you can eliminate it, if need be.


Previously, I described two uses for bridge tables. Both examples involved many-to-many relationships between tables. Many readers may have thought this was an error. In fact, it was intended.

One case involved a situation where facts must relate to multiple dimension members. Specifically, it was possible for two or more salespeople might collaborate on orders captured in a fact table. The bridged solution looked like this:

Notice that there is a many-to-many relationship between the bridge table and the fact table. This can be understood as follows:
  • A given fact may reference multiple rows in the bridge.  This happens when two salespeople collaborate on an order. 
  • A given row in the bridge may reference multiple facts.  This happens when the same group collaborates on more than one order line.
The bridge table allows us to link the repeating dimension to the facts, but as noted in the post, care must be taken not to double count.

A Legal Relationship

In the world of entity-relationship modeling, we are taught that many-to-many relationships may be present in a logical model, but not  in a physical design. But we are not discussing entity-relationship modeling.

In a dimensional model, this kind of relationship is allowed. It is also 100% compatible with the relational model of data.  Despite what you may think, it is also compatible with your relational database management system.

The figure above can be implemented as a physical design, and it will work. Unfortunately, some of our software tools do not understand this. 

Tool Problems

Many of the software products we use in data warehousing were originally developed for transaction processing systems. Because of this lineage, the many-to-many relationship can be a problem.

For example, suppose your DBMS works best if joins between tables are supported by declaration of primary key/foreign key relationships. This is a problem for the model pictured above. Order_facts and salesperson_bridge can be joined using sales_group_key.  But it is not a primary key in either table.

Luckily, this join will work even if it is not supported by any key constraints. Unfortunately, other tools may not be as forgiving.

Some data modeling tools do not permit a many-to-many relationship in a physical model. This presumably reflects their entity-relationship lineage. And if you use a SQL-generating BI tool to create reports, you may find it identifies bridge table as fact tables.

Eliminating The Many-To-Many

If your software products cannot live with a many-to-many relationship, there is a simple solution. You can resolve in the same way an entity-relationship modeler would: create an intersect table.

In the example above, a new table is added that contains one row for each group. An association or intersect table, will link the group to the salespeople. The result looks like this:
For those of us who find humor in data modeling, this is a source of much amusement. The sales_group table consists of a single column. And since the sole column is a surrogate key, the table's entire contents are likely to be a sequence of integers!

Still: problem solved. Only take this additional step if you really must.

More on Bridges

Bridges are powerful but dangerous . To make proper use of them, there is more to understand.  Additional topics include he impact of slowly changing dimensions, the use of allocation factors, how to prevent end users from double counting....the list goes on.

I will post more about bridges, but I encourage you to also read about them in my book, Star Schema The Complete Reference.  The book goes into much more depth on bridge tables than I can do in a blog post - or even a series of posts.  Two full chapters are dedicated to bridge tables.  Check it out!

- Chris

Bridge photo by Branwebs via Creative Commons

Wednesday, April 6, 2011

Linking Expenditures to Commitments [Q&A]

A reader working on a financial data mart has a question about linking commitments (such as purchase orders) to payments:

Q: Payments can be made after the end of the fiscal year in which the original commitments were made.  So you could have some payments in February that apply to the prior fiscal  year as well as payments made in February that apply to commitments made in the current current fiscal year.

How would you handle this in the time dimension?  Do we need multiple rows for February 2010 -- one for funds committed in 2010 and another for funds committed in 2009?
J. in New York

A: You don't need to tinker with the time dimension.  The key to solving this problem is to recognize that financial transactions have multiple dates associated with them.

For example, a payment may have an effective date (the date on the check) and a date of entry into the system, which may lag behind a bit. I've covered this concept in a previous post.

This reader wants to link the payment to a commitment, such as a purchase order.  The purchase order may have been issued in the previous year, as in his example.  We can acknowledge this in the design by adding an additional key in the fact table to reference the date of the commitment.

Hence, the payment fact table may have the following foreign key references to a DAY dimension:
  • day_key_effective (the date of the check or payment)
  • day_key_entered  (the date it was entered into the system)
  • day_key_committed (the date of the corresponding commitment)
Now cash flow can be understood properly in time (via day_key_effective) and the expenditures can be rolled up according to the date the funds were committed (using day_key_committed.)

If you wish to do this, it will be important to determine if/how the source system links each expenditure with a commitment.  You will also find that for each payment, you can probably capture two transaction identifiers:  one for the payment itself, and one for the original commitment.  These may also factor into your payments model as dimensions.

- Chris

Image is from Public Domain Photos