Friday, May 18, 2012

Using A Hierarchy Bridge WIthout A Many-to-many Relationship

In this post, I describe what you can do to eliminate the many-to-many relationship when using a hierarchy bridge.

The Hierarchy Bridge

A recent post on Recursive Hierarchies and Bridge Tables described how to support a recursive hierarchy in a dimensional model.

The business benefits of this kind of solution can be extremely valuable.

In the example, a special bridge table was used to navigate the relationships between companies and their subsidiaries. When connected to a fact table, this allowed us to roll up transactions via the recursive hierarchy.

Here is what the solution looked like.  Click the picture for a larger version.  For full details, refer to the original post.



Many-to-many is legal, but you may need to resolve it anyway.

You be wondering about the many-to-many relationship -- the one between the fact table and the bridge table.

This kind of relationship is perfectly valid in a relational database.  Unfortunately, some of our software products may not accept this kind of relationship, making it necessary to eliminate it.

I've written about this before in the context of a multi-valued dimension.  The solution in that case was to resolve the many-to-many relationship by creating a single-column table that consisted solely of a surrogate key.

With a hierarchy bridge, we can resolve the many-to-many without adding one of those silly single-column tables.

Additional table not required

In the recursive hierarchy above, we already have what we need: the company table itself.

Notice that in the picture above, the company table represents the "superior" to which we are aggregating transactions. The bridge links it to facts that describe the activities of subordinates.

To resolve the many-to-many, we will insert a second alias for the company table between the fact table and the bridge.  This alias represents the subordinate.

Now the picture looks like this:




An added benefit of this approach is that we can include information about both the superior and the subordinates in our query results.

Learn more

There are several posts describing bridge tables in this blog.  This post has mentioned two of them:
  • Use the category keyword bridges to find more posts.
And of course, you can help support this blog by picking up a copy of Star Schema: The Complete Reference.  It devotes two chapters and 30 illustrations to the topic of bridge tables.


Bridge photo by Branwebs via 
Creative Commons

Wednesday, May 2, 2012

Recursive Hierarchies And Bridge Tables

This post introduces the hierarchy bridge table, which can be used to navigate recursive hierarchies.

High Trestle Bridge at Night
Photo (c) Carol Bodensteiner
Bridge tables are a popular topic among readers of this blog. Perhaps that's because they can be a bit confusing, yet also enable powerful analysis.

Previously, I've focused on the two simplest uses for bridge tables:  the attribute bridge and the dimension bridge.  See links at the end of this post for more on these.

In this post, I'll tackle a third and final use: how a bridge can aid in the navigation of a recursive hierarchy.

Recursive relationships

In the real world, recursive relationships are common. Employees report to other employees, parts are made of other parts, and so forth.

In an E-R model, a recursive relationship might look like this:

Companies may own other companies.  That dotted relationship line is sometimes called a "pigs ear."  It links each company to the company that owns it.

You can read this relationship in two directions:  each company may have a parent company; each company may be the parent of one or more companies.

Looking up and looking down

From a business perspective, this information may be very useful.  For example, suppose you work for a business whose sells things to companies. You would like to be able to answer questions like these:
  1. What are January order dollars from to Company XYZ and its subsidiaries?
  2. Show me January orders from Company ABC and any companies above it.
The first question "looks down" from company XYZ.  The second question "looks up" from company ABC.

In both cases, we want to use the hierarchy to aggregate details about orders.

The hierarchy bridge

In a dimensional model, the idea is to support the study of facts (such as "order dollars") in the context of this hierarchy. This can be done using a bridge table, as shown here:

The table in the center is a hierarchy bridge.  This kind of table can be a bit confusing.  First, consider its content.

Each row contains a pair of company keys.  Here are the rules for populating the bridge table:
  1. For each company, the bridge contains rows associating it with each of its subordinates.
  2. Each company also has 1 row associating it with itself.
With respect to item 1, note each company is linked to all subordinates, not just the direct ones. (That's why the keys have the prefixes "superior" and "subordinate" rather than "parent" and "child".)

Using the bridge

In the picture above, notice that the fact table contains a company_key.  This represents the company from which each order is taken.  

Normally, we would join it to the company table to study "orders by company."

In this case, however, the bridge is inserted between the fact table and the company table. The bridge acts like a multiplexer--linking each company (on the right) to transactions with any and all of its subordinates (on the left).

This allows us to select a single company on the right (say Company XYZ) and aggregate all orders at or beneath that company.  That's called "looking down." It lets us answer question 1.

We can also use this hierarchy to "look up," as in question 2.  We simply reverse the joins--link company to the subordinate key, and the superior key to the fact table.  Then we can select a single company (say Company ABC) and look upward.

Powerful but dangerous

Of course, the danger of the bridge table should be readily apparent.  As you can see from the diagram above, if we forget to select a single parent company, we can easily double count the orders, triple count them, or worse.

For this reason, writing queries that involve the hierarchy must be governed carefully.  Developers or users with little experience can study the facts without using the bridge.  This will be safe, but they will not be able to leverage the hierarchy.

Other kinds of bridge tables

The hierarchy bridge described in this post is one of three kinds of bridge tables.  I've written previously about the other two kinds:
  • Bridge To Multi-Valued Dimensions (2/9/2011) shows how a bridge can link a single fact to multiple rows in a dimension.  Examples include orders with multiple salespeople, claims with multiple parties, etc.
More on the hierarchy bridge

When it comes to recursive hierarchies, this post only scratches the surface.  If you have questions, send them in.

In the mean time, there's a whole chapter on this topic in my book, Star Schema: The Complete Reference.  That would be Chapter 10:  "Recursive Hierarchies and Bridges." It's a long chapter--the longest one in the book.

And don't forget, when you use the links on this page to buy it from Amazon, you are helping to support this blog.


Copyright (c) Carol Bodensteiner
Used by permission, with many thanks!

Thursday, April 26, 2012

Q&A: Human resources data marts


A reader asks if Human Resources data marts are inherently complex. I run down a list of dimensional techniques he should expect to find:
Q: I will be working on a data mart design project to design star schemas for human resources data. I heard that HR data is more complex than sales or marketing and special techniques need to be applied.

I looked at the star schemas of pre-built analytical applications developed by some vendors for our packaged HR solution. I felt that they are quite complex and just wondering star design for HR data should be so complex.

If possible, can you please discuss this topic in a detailed manner by considering any one of popular HRMS system data and the most common data/reporting requirements along with the design discussion to achieve the star for those reports using the given HRMS data?

- Venkat, UK

A: Human Resources applications do indeed tend to use advanced techniques in dimensional design.

Below, I run down a list of topics you will probably need to brush up on. In reality, every subject area requires complete mastery of dimensional modeling, not just the basics.

Note that the complexity you are seeing in packaged solutions may stem from the subject area. Vendors often produce abstracted models to facilitate customization.

Techniques used in HR data marts

No doubt you are accustomed to the transaction-grained stars you encountered in sales.  You will find them in HR as well, but you will also encounter these:
  • Snapshot stars sample one or more metrics at pre-defined intervals.

    In an HR data mart, these may be used to track various kinds of accruals, balances in benefit programs, etc.

  • Accumulating snapshot stars track dimension members through a business process and allow analysis of the elapsed time between milestones.

    These may be used to track the filling of a position, "on-boarding" processes, disciplinary procedures, or applications to benefit programs.

  • Factless fact tables track business processes where the primary metric is the occurrence of an event. They contain no facts, but are used to count rows.

    These are likely to be used for tracking attendance or absence, participation in training courses, etc.

  • Coverage stars are factless fact tables that model conditions.  These are usually in place to support comparison to activities represented in other stars, but may also be leveraged to capture key relationships among dimensions.

    These are likely to be used for linking employees to positions, departments and managers. 
Your dimensions will also require reaching beyond the basics:
  • Transaction dimensions capture the effective and expiration date/time for each row in a dimension table.  These are advisable in almost any situation.

    In HR they may be used to  track changes in an employee dimension.

  • Bridge tables for Multi-valued attributes allow you to associate a repeating attribute with a dimension.

    In HR, these are likely to be used to associate an employee with skills, languages, and other important characteristics.

  • Hierarchy bridge tables allow you to aggregate facts through a recursive hierarchy.

    In HR, these are used to navigate reporting structures (employees report to employees, who in turn report to other employees, and so forth) as well as organizational structures.
I would also expect to encounter some complexity in slow-change processing rules.  Human Resource systems carefully audit certain kinds of data changes, tracking the reason for each change. As a result, you may have attributes in your dimension schema that may exhibit either type 1 or type 2 behavior, depending on the reason for the change.

Every schema goes beyond the basics

This list could go on, but I think you get the idea.

The only way to design a data mart that meets business needs is to have a well rounded understanding of the techniques of dimensional modeling.

You cannot get very far with nothing more than a grasp of the basics. This holds true in any subjet area -- even sales and marketing.  You need the complete toolbox to build a powerful business solution.

Packaged data marts

The complexity that concerns the reader may actually stem from another cause:  he is looking at packaged data mart solutions.

Packaged applications often introduce complexity for an entirely different reason: to support extensibility or customization. For example, facts may be stored row-wise rather than column-wise, and dimensions may contain generic attribute names.

Learn more

This blog contains posts on most of the topics listed above.  Click each header for a link to a related article.  Some have been discussed in multiple posts, but I have included only one link for each.  So also do some exploration.

In addition, please check out my book Star Schema: The Complete Reference.  When you purchase it from Amazon using the links on this page, you help support this blog.
  • Snapshots and accumulating snapshots are covered in Chapter 11, "Transactions, Snapshots and Accumulating Snapshots

  • Factless fact tables and coverage stars are covered in Chapter 12, "Factless Fact Tables"

  • Transaction dimensions are covered in Chapter 8, "More Slow Change Techniques"

  • Attribute bridges are covered in Chapter 9, "Multi-valued Dimensions and Bridges"

  • Hierarchy bridges are covered in Chapter 10, "Recursive Hierarchies and Bridges"
Thanks for the question!

- Chris

Send in your own questions to the address in the sidebar. 


Do you have another technique that was useful in an HR data mart? Use the comments.

Image credit: Gravityx9 licensed under Creative Commons 2.0