Chris Adamson’s Blog: Optional Relationships Without NULL

Wednesday, February 13, 2013

Optional Relationships Without NULL

Optional relationships are important in dimensional models. This post shows you how to support them without resorting to NULL keys in the fact table.

Last month, we looked at the impact of allowing dimension attributes to contain NULL. In this post, we'll look at the impact of allowing foreign keys in fact tables to contain NULL.

Once again, NULL will prove problematic. What should be simple queries will require an alternate join syntax, multiple comparisons and nested parentheses.

The preferred solution is to establish special-case rows in dimensions. These rows can be referenced by fact table rows that do not have corresponding dimension detail.

Recap on NULL

NULL is a special SQL keyword used to denote the absence of data.

Last month, I explained why we avoid allowing dimension attributes to contain NULL. NULL fails standard comparisons, necessitating query predicates containing numerous tests which are carefully balanced within sets of parentheses.

For the full story, and the preferred solution, see last month's post: Avoid NULL in Dimensions (1/7/2013).

But that was NULL dimension attributes. What about NULL foreign keys?

Optional Relationships and NULL

Sometimes, the relationship between a fact table and a dimension is optional. This means some rows in the fact table cannot be associated with the dimension.

In an ER model, the traditional solution is to store NULL foreign keys for such rows. Let's take a look at what would happen if we did that in a dimensional model.

You may have noticed that in some stores, the cashier asks you if a salesperson helped you. If so, they record that info. So some sales have a salesperson, some do not.

With an optional relationship to Salesrep, you star schema might look like this:

The dotted line represents an optional relationship. (In other notations, optionality is represented by including circles at the ends of relationship lines.) For fact table rows with no salesperson, salesrep_key contains NULL.

Usability Harmed by NULL Foreign Keys

When a foreign key can contain NULL, we once again face difficulties when answering some simple business questions. As before, NULL complicates queries because it requires a comparison syntax that is different from the syntax for standard values. This time, we'll also be facing different join syntax.

For example, using the sales star, you might like to see all sales where a manager was not involved. Assuming the Salesrep table has a column called salesrep_type, you would be forgiven for adding this to your query:

WHERE salesrep.salesrep_type != "Manager"

This predicate is not sufficient to find all sales without managerial involvement.

Assuming a standard join is linking sales_facts to salerep, rows with no salesrep will not appear in the query results. This happens because, for any fact without a salesrep_key, the join to salesrep will fail. An outer join must be used to help facts with no salesreps survive the join.

Even when an outer join is employed, the above constraint remains insufficient. That's because a side effect of the outer join is to create NULL salesreps in the data set.

In addition to an outer join, we must supplement the constraint above:

WHERE
( salesrep.salesrep_type != "Manager" OR
salesrep.salesrep_type IS NULL
) AND...

NULL keys force us to choose the correct join type, perform multiple comparisons against the same dimension attribute, and carefully balance parentheses.

A dimensional model is meant to be understandable and usable from a business perspective. NULL keys do not fit the bill.

Use a Special Case Row

When there is an optional relationship between fact and dimension, best practices call for a special row in the dimension. This row is referenced by facts that would otherwise require a NULL foreign key.

For example, we add a "not applicable" row to our salesrep table as so:

salerep_key	row_type	salerep_type	salesrep_name
0	No Salesrep	n/a	n/a
100	Salesrep	Associate	Paul Cook
101	Salesrep	Associate	Steve Jones
201	Salesrep	Manager	Glen Matlock

Now we don't need outer joins, and we don't need to bend over backwards to perform simple comparisons.

Further Reading

The technique described in this post can be extended to handle other situations (invalid data, future events, or reference data that becomes available after facts).

Read more about these possibilities in Chapter 6 of my book, my book, Star Schema: The Complete Reference.

Also check out the previous post, Avoid NULL in Dimensions (1/7/2013)

Edited 2/13/13 5:30pm to correct mismatched table headings. Thanks for the emails.

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software