Tuesday, March 29, 2011

Book Signing in DC, Date Change for Boston Seminars

Washington DC - Book Signing

I will be signing books next week at the TDWI World Conference in Washington DC.  You will find me in the main exhibit hall on Tuesday 4/5 from 11:30 am to 1:00 pm.

I will also be teaching on Monday and Tuesday, and conducting half-hour one-on-one sessions on Wednesday.

Hope to see you there! 

Boston Seminars - Date Change

The TDWI Boston Seminar will now be taking place June 13-16.  If you have already signed up, you should have received notification from TDWI. 

I will be teaching three courses that week, including an intensive two-day course on advanced dimensional design.  This is a rare public offering for my 2 day course, which is usually only offered through TDWI Onsite Education.

Monday, March 7, 2011

Q&A: Bridges are Part of Dimensional Modeling

Two recent posts on bridge tables generated a lot of questions.  Here is the most common:
When you use a bridge table, isn't that a snowflake? I thought this was not allowed in dimensional modeling.
The use of a bridge is a limited form of snowflaking.  It is also an important feature of the dimensional model.

Bridges are widely accepted, even among those who would otherwise avoid snowflake designs.

A Limited form of Snowflaking

In a Q&A from last year, I suggested that a snowflake occurs when dimensions are joined to something other than a fact table. By this definition, the examples from the previous two posts are certainly snowflakes.

In the first example, a bridge was used to associate a dimension with a repeating attribute (a company with multiple industries.)

In the second example, a bridge was used to associate a fact with multiple dimension rows (a sale with multiple salespeople.)

In both cases, a standard one-to-many relationship between fact and dimension does not exist.1 The bridge table solves this problem.

This usage is widely accepted. In fact, the bridge table is a central feature of dimensional modeling. I devote two full chapters to bridged designs in Star Schema: The Complete Reference.

Other Relationships are Not Modeled

When a bridge table is employed, most dimensional modelers will still refer to the schema as "a star." That's because the bridge is only used in the situations described above -- when the standard master-detail relationship between dimension table and fact table breaks down.

Other kinds of relationships between data elements are not resolved via multiple tables.  In our examples:
  • Repeating attributes are permitted (work phone, home phone, mobile phone) 
  • Dimensions contain partially dependent attributes (brand information present in a product table)
  • Master-detail relationships are not instantiated into separate tables
  • Values associated with codes are not placed in look-up tables
  • Dimensions are not linked to one another
An all-out snowflake design would employ principles of normalization to eliminate these features.

Other Times to Snowflake?

There may be some other situations where limited snowflaking is acceptable.  One is where a set of attributes (e.g. geographic) would otherwise appear in multiple dimension tables. Another is where software tools (the RDBMS or BI software) work better in a snowflake environment. 

These uses are a bit more controversial, and I wrote about them in a previous post.

When it comes to the bridge table, however, there is no controversy.  The bridge is an accepted technique in dimensional modeling, and is often necessary for specific dimensional designs.

- Chris

1 Bridges may also be used to navigate recursive relationships, as I describe in the book.

Image credit: Gravityx9 licensed under Creative Commons 2.0

Wednesday, February 9, 2011

Bridge to Multi-Valued Dimensions

Here's what you can do when a dimension table and fact are not in a one-to-many relationship.

A recent post described how a bridge can be used to resolve repeating attributes in a dimension. Today's post looks at a second use:  facts that must link to multiple rows in a dimension.

Facts with Multi-Valued Dimensions

In most star schemas, each fact table row references exactly one row in each associated dimension. This relationship between dimension table and fact table is variously called "one-to-many", "master-detail" or "parent-child". For example, in a star schema that measures the order-taking process, each row in the fact table will link to exactly one row in the product table, one row in the salesperson table, etc.

But what happens when more than one salesperson can be involved in an order? Suppose there could be two, three, or any number of salespeople? The one-to-many relationship breaks down.

Avoiding the Bridge

As with the repeating attribute from last time, we might try to accommodate this multi-valued dimension by simplifying the relationship. Perhaps we can isolate some "roles" that can be used to recover a clean one-to-many relationship. If there is always a lead salesperson and a secondary salesperson, we can simply place 2 salesperson keys in the fact table.

This has some disadvantages, though, and they are the same as those from last time.

First, it becomes difficult to produce a report showing all sales that a particular person was involved in. The query needs to select sales where the primary salesperson is the person in question, OR where the secondary salesperson is the person in question. Doing this for all salespeople in a department becomes tricky, especially if we are using a SQL-generating BI tool.

The second problem with simplifying the relationships is that there may be a need for any number of dimension values. Suppose 3 sales people collaborate on an order? Do we provide 3 salesperson keys in the fact table? This compounds the first problem, and it will only be a matter of time before we come across a case where we need 4.

A Dimension Bridge

The solution is to build a table that bridges the fact table and dimensions.

When a fact references a group of dimension values, a group will be created. The fact table will contain the group key. This key will join to the bridge table.  The bridge will have a row for each group member, linking to the dimension. An example appears here:


This simple solution allows individual rows in the fact table to link to any number of salespeople.  Notice that it requires a group to be established even for orders that have only one salesperson.

Using the Bridge

With a dimension bridge in place, it is easy filter facts for single dimension member, or group them by dimension members.  The sum of sales, for example, can be grouped by salesperson: the fact table joins to the bridge, the bridge joins to the dimension, and a simple group by query is issued.

Dimension bridges are used in many other situations. Multiple citations during an inspection, multiple parties in a dispute or claim, and so forth.

Dangers

As with the attribute bridge, the dimension bridge brings with it the possibility of misuse.

Without careful attention, facts may be double-counted, triple-counted or worse.  For example, if two people collaborate on a sale, a report of sale by salesperson will show the sale twice.  This kind of report is useful, but creating a grand total would be a mistake. Similar issues crop up if both people are in the same department, and someone tries to produce a report showing sales by department.

The solution is to make sure the bridge is only accessible to trained developers and analysts.  A simplified solution can be made available to ad hoc users. In this case, that might mean adding a primary salesperson key to the fact table. This allows ad hoc users to safely join the fact table and dimension table, but their analytic capability will be limited.

More On Dimension Bridges

This introduction has only scratched the surface of dimension bridges. As with the previous post, there are many more refinements and complications that can be introduced. These include the impact on ETL, the addition of allocation factors, coping with software that does not permit a many-to-many relationship and more.

These topics will be explored in future posts. You can also read all about bridge tables in Chatpers 9 and 10 of Star Schema: The Complete Reference.

Have a question about bridge tables? Send it to the address in the sidebar, and it may be addressed in a Q&A post.

Image by Branwebs via Creative Commons