Thursday, April 26, 2012

Q&A: Human resources data marts

A reader asks if Human Resources data marts are inherently complex. I run down a list of dimensional techniques he should expect to find:
Q: I will be working on a data mart design project to design star schemas for human resources data. I heard that HR data is more complex than sales or marketing and special techniques need to be applied.

I looked at the star schemas of pre-built analytical applications developed by some vendors for our packaged HR solution. I felt that they are quite complex and just wondering star design for HR data should be so complex.

If possible, can you please discuss this topic in a detailed manner by considering any one of popular HRMS system data and the most common data/reporting requirements along with the design discussion to achieve the star for those reports using the given HRMS data?

- Venkat, UK

A: Human Resources applications do indeed tend to use advanced techniques in dimensional design.

Below, I run down a list of topics you will probably need to brush up on. In reality, every subject area requires complete mastery of dimensional modeling, not just the basics.

Note that the complexity you are seeing in packaged solutions may stem from the subject area. Vendors often produce abstracted models to facilitate customization.

Techniques used in HR data marts

No doubt you are accustomed to the transaction-grained stars you encountered in sales.  You will find them in HR as well, but you will also encounter these:
  • Snapshot stars sample one or more metrics at pre-defined intervals.

    In an HR data mart, these may be used to track various kinds of accruals, balances in benefit programs, etc.

  • Accumulating snapshot stars track dimension members through a business process and allow analysis of the elapsed time between milestones.

    These may be used to track the filling of a position, "on-boarding" processes, disciplinary procedures, or applications to benefit programs.

  • Factless fact tables track business processes where the primary metric is the occurrence of an event. They contain no facts, but are used to count rows.

    These are likely to be used for tracking attendance or absence, participation in training courses, etc.

  • Coverage stars are factless fact tables that model conditions.  These are usually in place to support comparison to activities represented in other stars, but may also be leveraged to capture key relationships among dimensions.

    These are likely to be used for linking employees to positions, departments and managers. 
Your dimensions will also require reaching beyond the basics:
  • Transaction dimensions capture the effective and expiration date/time for each row in a dimension table.  These are advisable in almost any situation.

    In HR they may be used to  track changes in an employee dimension.

  • Bridge tables for Multi-valued attributes allow you to associate a repeating attribute with a dimension.

    In HR, these are likely to be used to associate an employee with skills, languages, and other important characteristics.

  • Hierarchy bridge tables allow you to aggregate facts through a recursive hierarchy.

    In HR, these are used to navigate reporting structures (employees report to employees, who in turn report to other employees, and so forth) as well as organizational structures.
I would also expect to encounter some complexity in slow-change processing rules.  Human Resource systems carefully audit certain kinds of data changes, tracking the reason for each change. As a result, you may have attributes in your dimension schema that may exhibit either type 1 or type 2 behavior, depending on the reason for the change.

Every schema goes beyond the basics

This list could go on, but I think you get the idea.

The only way to design a data mart that meets business needs is to have a well rounded understanding of the techniques of dimensional modeling.

You cannot get very far with nothing more than a grasp of the basics. This holds true in any subjet area -- even sales and marketing.  You need the complete toolbox to build a powerful business solution.

Packaged data marts

The complexity that concerns the reader may actually stem from another cause:  he is looking at packaged data mart solutions.

Packaged applications often introduce complexity for an entirely different reason: to support extensibility or customization. For example, facts may be stored row-wise rather than column-wise, and dimensions may contain generic attribute names.

Learn more

This blog contains posts on most of the topics listed above.  Click each header for a link to a related article.  Some have been discussed in multiple posts, but I have included only one link for each.  So also do some exploration.

In addition, please check out my book Star Schema: The Complete Reference.  When you purchase it from Amazon using the links on this page, you help support this blog.
  • Snapshots and accumulating snapshots are covered in Chapter 11, "Transactions, Snapshots and Accumulating Snapshots

  • Factless fact tables and coverage stars are covered in Chapter 12, "Factless Fact Tables"

  • Transaction dimensions are covered in Chapter 8, "More Slow Change Techniques"

  • Attribute bridges are covered in Chapter 9, "Multi-valued Dimensions and Bridges"

  • Hierarchy bridges are covered in Chapter 10, "Recursive Hierarchies and Bridges"
Thanks for the question!

- Chris

Send in your own questions to the address in the sidebar. 

Do you have another technique that was useful in an HR data mart? Use the comments.

Image credit: Gravityx9 licensed under Creative Commons 2.0