Tuesday, March 2, 2010

Just the Facts?

A reader asks:
I’m having a problem in picking out facts. For example are all dates facts -- start date, end date and DoB?  What about personal data like post code, job title, status (active /left)  etc. Would you say that if you cant do math on it, its not a fact?
- [Name Withheld], UK
Good question.  It allows me to cover the basics, as well as touch on some advanced topics.

First, the Basics..

Facts vs. Dimensions

Facts are measurements.  I really prefer that word, because it is a bit more descriptive.  But the rest of the world already knows "Facts".

Examples that are clearly measurements:
  • order dollars
  • gross pay
  • quantity sold
  • account balance
In the original question, all the attributes mentioned are dimensions.  (Dates, personal data, and so forth.)

Dimensions are used to give facts context.  You use dimension values to filter or group facts.  Here are some in action:
  • order dollars (fact) by order date (dimension)
  • gross pay (fact) by department code (dimension)
  • quantity sold (fact) for January (dimension value)
You can usually "do math" on facts.  The most common example is adding them together, or aggregating them, as might be done in the examples above.

Don't use that as a criterion to to identify facts, though.  Many numbers are not facts. And, to make matters more complicated, and some non-numbers are facts.

Many Numbers are Not Facts

Some numbers are really dimensions.  You can tell because we use them to give context to facts.  A numeric department code is obviously a dimension.  We wouldn't normally add the values together, but we might use them to break out a measurement -- for example: budget (fact) by department (dimension).

Less obvious are things like "unit price".  It sounds like it might be a fact. After all, we can certainly add up money, right?   Actually, when we add up sales, we don't add up unit prices, we add up "extended prices".  Extended price is a fact.  It is equivalent to unit price times quantity.

Look again, and you will also see that unit price behaves like the other dimensions we looked at.  We can use it to group or filter facts. "How many did we sell at a unit price of 10 cents?" That question filters a fact (quantity sold) by a dimension value (10 cent unit price).

Some Facts are Not Numbers

There are also facts that are not numbers.  These are rare, but they do occur.

Test results or grades are the most common example.  You can't do math on "pass" or "fail," but they are measurements.  Facts like this are sometimes called "text facts."  A test_result fact that can take on the values "pass" or "fail" might be stored in a fact table that contains a row for each student taking a particular test on a particular day.

If they take on a relatively low number of possible values, consider converting text facts into discrete facts.  In this case, we might replace test_result with two facts:  number_passed and number_failed.  For each row in the fact table, one of these will take on the value 1, the other the value 0.  This may seem wasteful, but now it is easy to get a sum(number_passed) or sum(number_failed.)

Some Stars Have No Facts

If you don't have any facts, you can still have a fact table.  Sometimes, the mere occurrence of an event is all you need to know about.  For example, you might use a fact table to track the fact that a employee was present on a given day.

Factless fact tables can also track conditions.  For example, each year, circumstances might make employees eligible to participate in particular programs.   Aside from the dimensions (Employee, Year and Program) there might not be any facts to record.  That's OK, you can still count up rows that meet various conditions.

- Chris

Do you have a question?  Send it in.  I answer all my email, though sometimes it takes a while.

Image: Jack Webb as Joe "Just the Facts" Friday,
from the Public Domain