Showing posts with label People. Show all posts
Showing posts with label People. Show all posts

Tuesday, March 2, 2010

Just the Facts?

A reader asks:
I’m having a problem in picking out facts. For example are all dates facts -- start date, end date and DoB?  What about personal data like post code, job title, status (active /left)  etc. Would you say that if you cant do math on it, its not a fact?
- [Name Withheld], UK
Good question.  It allows me to cover the basics, as well as touch on some advanced topics.

First, the Basics..


Facts vs. Dimensions

Facts are measurements.  I really prefer that word, because it is a bit more descriptive.  But the rest of the world already knows "Facts".

Examples that are clearly measurements:
  • order dollars
  • gross pay
  • quantity sold
  • account balance
In the original question, all the attributes mentioned are dimensions.  (Dates, personal data, and so forth.)

Dimensions are used to give facts context.  You use dimension values to filter or group facts.  Here are some in action:
  • order dollars (fact) by order date (dimension)
  • gross pay (fact) by department code (dimension)
  • quantity sold (fact) for January (dimension value)
You can usually "do math" on facts.  The most common example is adding them together, or aggregating them, as might be done in the examples above.

Don't use that as a criterion to to identify facts, though.  Many numbers are not facts. And, to make matters more complicated, and some non-numbers are facts.

Many Numbers are Not Facts

Some numbers are really dimensions.  You can tell because we use them to give context to facts.  A numeric department code is obviously a dimension.  We wouldn't normally add the values together, but we might use them to break out a measurement -- for example: budget (fact) by department (dimension).

Less obvious are things like "unit price".  It sounds like it might be a fact. After all, we can certainly add up money, right?   Actually, when we add up sales, we don't add up unit prices, we add up "extended prices".  Extended price is a fact.  It is equivalent to unit price times quantity.

Look again, and you will also see that unit price behaves like the other dimensions we looked at.  We can use it to group or filter facts. "How many did we sell at a unit price of 10 cents?" That question filters a fact (quantity sold) by a dimension value (10 cent unit price).

Some Facts are Not Numbers

There are also facts that are not numbers.  These are rare, but they do occur.

Test results or grades are the most common example.  You can't do math on "pass" or "fail," but they are measurements.  Facts like this are sometimes called "text facts."  A test_result fact that can take on the values "pass" or "fail" might be stored in a fact table that contains a row for each student taking a particular test on a particular day.

If they take on a relatively low number of possible values, consider converting text facts into discrete facts.  In this case, we might replace test_result with two facts:  number_passed and number_failed.  For each row in the fact table, one of these will take on the value 1, the other the value 0.  This may seem wasteful, but now it is easy to get a sum(number_passed) or sum(number_failed.)

Some Stars Have No Facts

If you don't have any facts, you can still have a fact table.  Sometimes, the mere occurrence of an event is all you need to know about.  For example, you might use a fact table to track the fact that a employee was present on a given day.

Factless fact tables can also track conditions.  For example, each year, circumstances might make employees eligible to participate in particular programs.   Aside from the dimensions (Employee, Year and Program) there might not be any facts to record.  That's OK, you can still count up rows that meet various conditions.

- Chris

Do you have a question?  Send it in.  I answer all my email, though sometimes it takes a while.

Image: Jack Webb as Joe "Just the Facts" Friday,
from the Public Domain

Thursday, November 19, 2009

The Tolkien Effect

Schema designers must be on the lookout for data elements that are known by more than one name.  Equally common is the use of a single name to signify very different things.

It may surprise you to learn that there is an important connection between data warehouse design and The Lord of the Rings. In the books, author JRR Tolkien challenges readers by using many different names, without explanation, for the same character.

The character Aragorn, for example, is also known as Strider, Dunadan, the Heir of Isildur, and several other names and titles.  Each name, it turns out, is associated with a different culture or point of view.

At first, all this can be deeply confusing.  With a little effort and patience, however, things begin to make sense, and the end result can be deeply rewarding.

The Tolkien Effect in Business

The same kind of thing happens when gathering requirements for a dimensional model.  Within a business, it is commonplace to find several names for the same thing.   Different departments, for example, may refer to products in different ways.  Even within a department, there may be multiple names for the same thing.

Depending on your previous experience with the area in question, it may take you some time to realize this is going on. I will never forget the day  I realized that a finance group meant the same thing by Ten-Digit-Department, Level 3 Code and Budget Line. 

It’s crucial to identify these situations, or the capabilities of your model will be crippled.  Data elements of interest in multiple contexts should be given a single, shared definition in your model.  For dimensions in particular, this will be crucial in supporting analysis that crosses subject areas.

These shared dimensions are called conformed dimensions, and they are the key to avoiding stove-pipe subject areas. Even within a subject area, this can be crucial.  The Ten-Digit-Department realization was essential in permitting comparison of budgets to actuals.

The Reverse-Tolkien

The converse is also a commonplace: a single name used to signify very different things.  The best example of this is “Sales.”  A salesperson will often use this word to refer to an order or contract.  In finance, however, the word is reserved for the event that allows the recognition or revenue, which is often fulfillment or shipment of the order. 

Once again, it is crucial that analyst keep an eye out for these situations; failure to produce consistent and well defined definitions for each fact or measurement is also a failure of conformance.  The result will be challenges to the accuracy of the data, distrust of the solution, and a lack of user adoption.

What You Can Do

How then to avoid these problems?  Listen.  Don’t assume you know what people mean. Repeat things back in your own words.  Be sure to write down and review definitions of each data element. 

Look out for The Tolkien Effect.  Pay close attention to people who live and work on the cusp of two subject areas or departments, as they will be keenly aware of these kind of linguistic challenges.  So will the data administrator, if your organization has one.

-Chris

Image Attribution:  lrargerich
CC BY 2.0