Showing posts with label Accumulating Snapshots. Show all posts
Showing posts with label Accumulating Snapshots. Show all posts

Thursday, April 26, 2012

Q&A: Human resources data marts


A reader asks if Human Resources data marts are inherently complex. I run down a list of dimensional techniques he should expect to find:
Q: I will be working on a data mart design project to design star schemas for human resources data. I heard that HR data is more complex than sales or marketing and special techniques need to be applied.

I looked at the star schemas of pre-built analytical applications developed by some vendors for our packaged HR solution. I felt that they are quite complex and just wondering star design for HR data should be so complex.

If possible, can you please discuss this topic in a detailed manner by considering any one of popular HRMS system data and the most common data/reporting requirements along with the design discussion to achieve the star for those reports using the given HRMS data?

- Venkat, UK

A: Human Resources applications do indeed tend to use advanced techniques in dimensional design.

Below, I run down a list of topics you will probably need to brush up on. In reality, every subject area requires complete mastery of dimensional modeling, not just the basics.

Note that the complexity you are seeing in packaged solutions may stem from the subject area. Vendors often produce abstracted models to facilitate customization.

Techniques used in HR data marts

No doubt you are accustomed to the transaction-grained stars you encountered in sales.  You will find them in HR as well, but you will also encounter these:
  • Snapshot stars sample one or more metrics at pre-defined intervals.

    In an HR data mart, these may be used to track various kinds of accruals, balances in benefit programs, etc.

  • Accumulating snapshot stars track dimension members through a business process and allow analysis of the elapsed time between milestones.

    These may be used to track the filling of a position, "on-boarding" processes, disciplinary procedures, or applications to benefit programs.

  • Factless fact tables track business processes where the primary metric is the occurrence of an event. They contain no facts, but are used to count rows.

    These are likely to be used for tracking attendance or absence, participation in training courses, etc.

  • Coverage stars are factless fact tables that model conditions.  These are usually in place to support comparison to activities represented in other stars, but may also be leveraged to capture key relationships among dimensions.

    These are likely to be used for linking employees to positions, departments and managers. 
Your dimensions will also require reaching beyond the basics:
  • Transaction dimensions capture the effective and expiration date/time for each row in a dimension table.  These are advisable in almost any situation.

    In HR they may be used to  track changes in an employee dimension.

  • Bridge tables for Multi-valued attributes allow you to associate a repeating attribute with a dimension.

    In HR, these are likely to be used to associate an employee with skills, languages, and other important characteristics.

  • Hierarchy bridge tables allow you to aggregate facts through a recursive hierarchy.

    In HR, these are used to navigate reporting structures (employees report to employees, who in turn report to other employees, and so forth) as well as organizational structures.
I would also expect to encounter some complexity in slow-change processing rules.  Human Resource systems carefully audit certain kinds of data changes, tracking the reason for each change. As a result, you may have attributes in your dimension schema that may exhibit either type 1 or type 2 behavior, depending on the reason for the change.

Every schema goes beyond the basics

This list could go on, but I think you get the idea.

The only way to design a data mart that meets business needs is to have a well rounded understanding of the techniques of dimensional modeling.

You cannot get very far with nothing more than a grasp of the basics. This holds true in any subjet area -- even sales and marketing.  You need the complete toolbox to build a powerful business solution.

Packaged data marts

The complexity that concerns the reader may actually stem from another cause:  he is looking at packaged data mart solutions.

Packaged applications often introduce complexity for an entirely different reason: to support extensibility or customization. For example, facts may be stored row-wise rather than column-wise, and dimensions may contain generic attribute names.

Learn more

This blog contains posts on most of the topics listed above.  Click each header for a link to a related article.  Some have been discussed in multiple posts, but I have included only one link for each.  So also do some exploration.

In addition, please check out my book Star Schema: The Complete Reference.  When you purchase it from Amazon using the links on this page, you help support this blog.
  • Snapshots and accumulating snapshots are covered in Chapter 11, "Transactions, Snapshots and Accumulating Snapshots

  • Factless fact tables and coverage stars are covered in Chapter 12, "Factless Fact Tables"

  • Transaction dimensions are covered in Chapter 8, "More Slow Change Techniques"

  • Attribute bridges are covered in Chapter 9, "Multi-valued Dimensions and Bridges"

  • Hierarchy bridges are covered in Chapter 10, "Recursive Hierarchies and Bridges"
Thanks for the question!

- Chris

Send in your own questions to the address in the sidebar. 


Do you have another technique that was useful in an HR data mart? Use the comments.

Image credit: Gravityx9 licensed under Creative Commons 2.0

Sunday, February 19, 2012

Accumulating snapshots and Type 2 changes

How does one maintain an accumulating snapshot when the dimension that defines its grain can undergo type 2 changes?

A reader recently asked this question via the comments for a recent post.

The short answer:  an accumulating snapshot has only 1 row for each thing being tracked.  If that "thing" can undergo type 2 changes, then the accumulating snapshot should link to only one version -- the current version.

State transitions

Many businesses track items that go through state transitions.  A trouble ticket, for example, is opened, assigned to a support rep, and eventually closed.  A mortgage application is submitted, reviewed, processed, underwritten, and settled upon.

These activities may be represented by a series of fact tables associated with the various process milestones.  Or, they may be captured by a single fact table that records a row for each status change.

When it comes time to study the average time spent at the various stages, however, these options may not be ideal.  In the former case, it will be necessary to drill across several fact tables.  In the latter case it will be necessary to perform correlated subqueries.

The accumulating snapshot model addresses these concerns.

The grain of an accumulating snapshot

An accumulating snapshot receives one row for each item being tracked.  This single row is updated as the item reaches various milestones.  A series of facts are incremented for each day spent in a given stage.  When a milestone is achieved, the appropriate date key is populated.

(For a full refresher on the accumulating snapshot, see the links at the end of this post.)

Unlike other fact tables, then, the accumulating snapshot always reflects the current state of affairs.  In the case of a mortgage processing fact table, for example, each row shows a single application, along with how many days it has spent at each processing stage.

This makes it very easy to study the average time between various stages -- just take the average of the facts in question.  No drilling across, no correlated subqueries.

Changes in the defining dimension

Back to the original question, then. What happens if the dimension that defines the grain of the accumulating snapshot can undergo type 2 slow changes?

The goal of the accumulating snapshot is to summarize each real world item in a single row.  Even if the dimension contains multiple rows for a particular item, the fact table must contain only one. If it contained more than one, its usability would harmed.

As I have written before, this means the defining dimension may have more rows than the accumulating snapshot.  This is OK.  Its just another way in which the accumulating snapshot differs from other types of fact tables.

Our mortgage application, then, must have one and only one row in the fact table -- even if type 2 changes mean there are more than one rows for it in the dimension.

Linking to the current row

The question, then, becomes which row should the fact table link to?

Because the accumulating snapshot is current-valued, it makes the most sense to have it link to the row in the dimension table that represents the current state of the item being tracked.  In our example, this would be the most recent version of the mortgage application.

If you increment the facts in your accumulating snapshot daily (something I suggest in my book), you should also take this opportunity to adjust the key values for any items that have undergone a type 2 change.  If you adjust the facts only when a milestone is met (something you may do if the volume is very high), you may choose to update the key value at this time instead.

Learn more

This and other topics surrounding the accumulating snapshot are discussed in depth in Star Schema: The Complete Reference, in Chapter 11: "Transactions, Snapshots and Accumulating snapshots."

You can help support this blog by using the links in the sidebar to purchase a copy.

You can also read more in these posts:
  • When do you need an accumulating snapshot (January 26, 2012) a recent post that triggered the question, wherein a reader asked how to choose between building an accumulating snapshot and tracking something as a series of type 2 changes. In some cases, the answer is to do both.
  • Q&A: Accumulating snapshots (October 1, 2010) Explores the cardinality relationship between accumulating snapshot and dimension table 

Image Credit:  Creativity103 via Creative Commons

Thursday, January 26, 2012

When do you need an accumulating snapshot?

A reader wonders how to decide between two options:  designing an accumulating snapshot vs. tracking status changes within a dimension.

I have new project to track the status of order transition. I'm unable to reach a conclusion as to implement as an accumulating snapshot or a type 2 slowly changing dimension. ETL  integrates a number of individual systems as the order transits each stage. What is the best way to design it?
Kumar
Milton Keynes, UK

Many businesses have one or more central dimensions that undergo a state transition as they touch multiple processes. Is it enough to track the changes to the dimension?  Or is an accumulating snapshot needed?

I'll walk you through some decision criteria.  But first a little refresher on the two design options the reader is considering.

Type 2 changes and timestamps


Type 2 changes track the history of something represented by a dimension.  Each time there is a change to this item, a new row is inserted in to the dimension table.

This allows any row in a fact table to be associated with a historically accurate version of the dimension as of the relevant point in time.

In the reader's case, status of the order might be an attribute of an Order dimension.  Modeled as a type 2 change, the dimension holds a status history for each order.  Adding effective and expiration dates to each row, you know exactly when each state transition occrred.

Accumulating snapshots

An accumulating snapshot is a type of fact table that records a single row for something the enterprise tracks closely, such as a trouble ticket or mortgage application--or, in the reader's case, an order.

This fact table contains multiple references to the date dimension -- one for each of the major milestones that the item in question can reach.  In the case of the order, this might be the date of order, the date of credit approval, the date of picking, the date of shipment and the date of delivery.

Unlike other kinds of fact tables, the accumulating snapshot is intended to be updated.  These dates are adjusted each time one of the milestones is reached.

There may also be facts that track the number of days (or minutes) spent between each milestone.  These "lags" are a convenience -- they can be computed from the dates.  (Building them into the fact table makes analysis much easier, but does require that the ETL process revisit rows on a regular basis, rather than when status changes.)

Avoiding correlated subqueries

If a type 2 slowly changing dimension with timestamps tracks the history, why consider an accumulating snapshot?

The analytic value of the accumulating snapshot is that it allows us to study the time spent at various stages. In the reader's case, it can make it simple to study the average time an order spends in the "picking" stage, for example.

We can do this with a type 2 slowly changing dimension as well, but it will be more difficult to study the average time between stages. For the order in question, days spent in the picking stage requires knowing the date of credit approval and the date picked.  These will be in two different rows of the dimension.  Now imagine doing this for all orders placed in January 2012.  This will require a correlated subquery.

The accumulating snapshot pre-correlates these events and places them in a single row.  This makes the queries much easier to write, and they are likely to run faster as well.  The cost, of course, is the increased data integration burden of building the additional fact table.

Avoiding drilling across

When each of the discrete milestones is captured by a different fact table, lag may be computed without correlated subqueries.  In this case, it will involve drilling across.

For example, separate fact tables track orders, credit approvals, picking and shipping.  Each references the order dimension.  Days spent in the picking stage can be studied by drilling across credit approvals and picking, with results linked by the common order dimension. 1

Here, the pressure for an accumulating snapshot is reduced.  It may still be warranted, depending on your reporting tools, developer skills an user base.

Summary and final advice

In the end, your decision should boil down to the following:
  1. An accumulating snapshot should only be considered if you are studying the time spent between major milestones
  2. If it helps avoid correlated subqueries, it may be a strong option
  3. If it avoids drill-across queries, it may be a useful option
Making the choice will impact several groups -- ETL developers, report developers, and potentially users.  Make sure this is a shared decision.

Also keep in mind the following:
  • If you build an accumulating snapshot, you will probably also want to track status in the dimension as a type 2 change.
  • Accumulating snapshots work best where the milestones are generally linear and predicable.  If they are not, the design and maintenance will be significantly more complex.
Last but not least:
  • The accumulating snapshot should be derived from one or more base fact tables that capture the individual activities.  
When in doubt, build the base transaction-grained fact tables first.  You can always add an accumulating snapshot later. 
Learn more

This is a popular topic for this blog.  Here are some places where you can read more:
  • Q&A: Accumulating snapshots (October 1, 2010) Explores the cardinality relationship between accumulating snapshot and dimension table
And of course, these concepts are covered extensively in my books.  In the latest one, Star Schema: The Complete Reference, the following chapters may be of interest:
  • Chapter 8, "More Slow Change Techniques" discusses time stamped tracking of slow changes
  • Chapter 11, "Transactions, Snapshots and Accumulating Snapshots" explores the accumulating snapshot in detail.
  • Chapter 14, "Derived Schemas", discusses derivation of an accumulating snapshot from transaction-grained stars.
  • Chapter 4, "A Fact Table for Each Process", includes a detailed discussion of dril-across analysis.

Image Credit:  Creativity103 via Creative Commons

1 Note that this scenario applies to the reader, but does not always apply.  Trouble tickets, for example, may be tracked in a single fact table that receives a new row for each status change.  In this case, there is no drill-across option.

Monday, December 13, 2010

Deeper into the Accumulating Snapshot

The most popular posts on this blog discuss accumulating snapshot designs. In this post, I'll share some new insights on the technique, which come from my book Star Schema The Complete Reference.

In particular, I want to debunk some common misconceptions about accumulating snapshots.

The Accumulating Snapshot

One of the three major kinds of fact table, the accumulating snapshot is the least understood and the most underutilized. 

I provided a basic description of the technique several years ago in the first post to this blog.  The basic concept is simple: in this kind of design, each row in the fact table tracks the changes to something over time.  The idea is to facilitate reports that look at how long things spend in various processing stages.

For example, the post mentioned above describes an accumulating snapshot that tracks the processing of mortgage applications.  Each row in the fact table tracks a single application, and is updated each time it completes a processing stage.  It contains foreign key references to multiple dates:
  • Submitted Date
  • Approval Date
  • Processed Date
  • Underwritten Date
  • Settlement Date
Corresponding to each of these milestones is a fact that indicates how long it spent in the stage.  In the previous post, I've called these facts "lags," but I'll suggest a different way of looking at them in a moment.

These facts make it it easy to study the average time items spend in each stage. Without an accumulating snapshot, this might require correlated sub-queries,  or the comparison of detail rows from multiple fact tables.

Unique Characteristics

Unlike other kinds of fact table, the intent is to update rows in an accumulating snapshot.  When an application reaches a new status, for example, the appropriate date is added.  I've touched on this topic in a prior Q and A post.

Another characteristic that differs from other kinds of fact tables is the size of the accumulating snapshot.  It usually has less rows than the dimension table representing the item being processed.  That's because the item itself is likely to undergo type 2 slow changes.  I've noted this in a previous post as well.

Lastly, the accumulating snapshot is essentially current valued.  It is used to study the net effects of a process up to and including the current point in time. 

Non Linear Processes

In the past, I've written that the accumulating snapshot is only effective for processes that follow a consistent, predictable series of steps.  (See my previous books in the sidebar to this blog.)

But the technique can easily be adapted to accommodate processes that are not linear.

For example, suppose our mortgage application may bounce back to the processing stage if the underwriter discovers some missing documentation. Conventional wisdom holds that this will "break" the accumulating snapshot, since it violates the standard sequence of status milestones.

This non-linear behavior is easily accounted for. First, we will rename our "lag" facts to represent the number of days spend at each stage:
  • Days awaiting approval
  • Days processing
  • Days underwriting
  • Days awaiting settlement
The ETL process will be developed to check each open application on a daily basis, and increment the appropriate fact for its processing stage.  

This means it does not matter if an application reaches the underwriting stage and then gets sent back to the processing stage.  If this occurs, the ETL process will simple resume incrementing the "days processing" fact.

One thing that will need to be worked out is what dates to use for each milestone. For the application in question, once the processor finishes with it for the second time, what is the processed date?  The first time it achieved this status, or the second?  Business rules must drive these decisions.

Long Duration Processes

We are also accustomed to thinking of accumulating snapshots as appropriate only for short-duration processes.

However, my customers and students have found success in modeling activities of significant duration. Examples include the tenure of faculty members, the life cycle of a service contract, the duration of a mortgage or insurance policy, and the incarceration of a prisoner.

Some of these processes do not have fixed durations, and may prove to span years or even decades. As long as it is feasible for the ETL process to continually update records for active entities in the process, the approach is effective.

Going Even Deeper

Several other topics are worth considering:
  • You may have an operational system that logs very detailed status changes. It is possible to map a set of hundreds of statuses into a few key milestones for an accumulating snapshot. 
  • If the item or entity being processed undergoes slow changes, update its surrogate key in its corresponding fact table row.  This will be sure the fact table always points to the most recent version of the item.
  • Some milestones make useful dimension attributes as well. Incorporate the dates into the dimension table as appropriate.  For example, "date of approval" may be a useful characteristic in the mortgage application dimension.
  • For milestones that have not been achieved, don't include a NULL-valued foreign key in the fact table.  Instead, establish a special row your day dimension for events that have yet to occur.  Set its attributes to "N/A" and give it the highest possible date supported by your DBMS.  This will simplify joins and query predicates.
  • If at all possible, source the accumulating snapshot from other fact tables that log status changes as transactions.  This will make the ETL process much simpler. 
These topics are all discussed in the book.

You will find this same kind of deep dive in every topic it covers.  You can help support this blog by using the links in the sidebar to order a copy.

Thanks for your support, and feel free to send in your questions!

Image Credit:  Creativity103 via Creative Commons

Friday, October 1, 2010

Q&A: Accumulating Snapshots

A reader asks about 1:1 relationships in a star schema.
I am designing a superannuation data warehouse that will have a claims accumulating snapshot fact table. This table will contain  number of dates that will be filled in when a claim is being processed. From this  we will be able to calculate time spans between the various claims processing stages.

It is intended that the fact table will contain a claim_key that will reference a claims dimension (1:1 relationship ).  I have been told by a few people that a 1:1 relationship is not a recommended design model.
- A reader in Australia

A:  There is absolutely nothing wrong with having a 1:1 relationship.  (That's true for both ER models and dimensional models.) 

Accumulating snapshots often cause the kind of disagreement you mentioned.  They are less common.  People mistakenly generalize characteristics of more common designs, assuming they apply to all stars. 

This is natural, but it leads to three incorrect assumptions:
  1. The fact table is on the "many" end of a 1:Many relationship with each dimension
  2. The fact table will have the most rows
  3. Fact table rows are not updated once they are loaded
These assumptions are all false when it comes to an accumulating snapshot.  (In fact, even a transaction fact table may violate the first two assumptions.) 

You've asked about the first one, but let me debunk all three:

  1. By definition the accumulating snapshot will have a 1:1 relationship with one of the dimension tables.  Why? Because its purpose is to track the progression of some discrete item through a business process.  Its grain is almost always stated in terms of a dimension -- e.g. "One Row per Claim."

  2. If the dimension can undergo type 2 slow changes, the accumulating snapshot will likely have LESS ROWS than the dimension.  This is desirable, and I have written about it before.

  3. The purpose of the accumulating snapshot is to study lags between key events, while avoiding correlated sub queries.  Unlike other fact tables, this requires the row to be updated as more information becomes known about the item being tracked.

I've written about these topics a few times in the past, including the post mentioned above. I also explore these topics in much greater detail in Chapter 11 of Star Schema: The Complete Reference.

In the book, I also show that an accumulating snapshot be used even when there is not a fixed, linear series of business steps being tracked.

Anyone who is not familiar with the accumulating snapshot is encouraged to learn more.  It is an essential design technique for many common business requirements.

Thanks for the question, and keep them coming...
-Chris


Image Credit:  Creativity103 via Creative Commons

Thursday, January 14, 2010

Accumulating Snapshots: Are They Necessary? [Q&A]

A reader asks the following question:

Q: We have an accumulating snapshot that tracks trouble tickets from submission to closure. The fact table has one row for each trouble ticket. Why do we need it? Can't we just place all this information in the trouble ticket dimension?
A: You probably do need the fact table, but it might be worth putting some of the information in the dimension table as well.

Before I explain, here is a quick refresher on the accumulating snapshot.

Accumulating Snapshot

An accumulating snapshot is a kind of fact table used to correlate activities surrounding a specified item. In this case, that item is a trouble ticket.


Accumulating snapshots are useful because they simplify the queries necessary to study elapsed time between events. In the reader's case, that might be the time between submission and closure of a ticket (there are probably some interim steps as well).

On the operational side, these events are probably recorded in different rows, and maybe even in different systems. Figuring out the days between these events can be messy, especially if averaging a results for a large number of tickets. Correlated subqueries will be reqiured, and performance may be poor.

The accumulating snapshot does the correlation of these activities ahead of time, as part of the ETL process. In this case, it has one row for each ticket. Presumably, there are foreign keys that identify the ticket, the date submitted, and the date closed. An optional fact can record the elapsed time between these dates. This fact is very easy to aggregate and group by any dimension of interest. No need for for correlated subqueries.

You can read more about accumulating snapshots in a prior post.

The Fact Table is Necessary

The reader has noted that the fact table has one row per trouble ticket. (That's a good statement of grain, by the way.) Since there is only one row for each ticket, it might seem that the content of the fact table could be simply be placed in the ticket dimension.

This is probably not a good idea. Though your fact table has one row per ticket, your ticket dimension may have more than one, due to type 2 changes that track history.  For a particular trouble ticket, when a type 2 change occurs, a new row is added for the new version of the ticket. A given ticket may therefore have more than one row in the dimension.

If you were to place a fact in this dimension table, like the days between submission and closure, it would be repeated for each version of the same ticket. Aggregating it would result in double counting.

The fact is better placed in the fact table, where they can safely be summed, averaged, or whatever is necessary.

(By the way, the same would occur if you placed the two dates in the dimension table. They would repeat for each version of the ticket. If you try to aggregate the elapsed time between the dates, the same ticket would be counted multiple times -- once for each version.)

Place Information in the Dimension Also (Not Instead)


Information tracked by the accumulating snapshot may also be useful in the dimension table. But here, its purpose is to serve as dimensional data.  Not as a replacement for the fact table.

For example, the dates the ticket was opened and closed could make helpful type 1 attributes in the ticket dimension.  This is especially useful if the dimension is connected to other fact tables, such as one that tracks phone calls to the support center. The dates can be used to group call counts in interesting ways. For example, "How many support calls were generated by tickets opened on January 1 vs. January 2."

If you want to, you can also place the elapsed days fact in the dimension table. Here, it should be used strictly as a behavioral dimension -- not as a fact.  Use it to group results. Don't try to use it as a fact; that would cause double counting. You might even want to convert it into bands to avoid confusion (1-5 days, 6-10 days, over 10 days.)

- Chris
Image Credit:  Creativity103 via Creative Commons

Do you have a question of your own? Send it in. I'd be happy to answer.

Tuesday, March 6, 2007

The Accumulating Snapshot

Although my latest book is about aggregate tables, it covers design basics as well as advanced topics. This excerpt describes how to use an accumulating snapshot design to analyze business processes involving multiple steps.

Excerpted from Mastering Data Warehouse Aggregates, by Chris Adamson.
Used by permission.

Many business processes can be understood in terms of a series of stages or steps that something must undergo. In made-to-order manufacturing, an individual item is ordered, manufactured, quality assured, packaged, and shipped. In banking, a mortgage application is submitted, reviewed by a loan officer, evaluated by an underwriter, and eventually consummated at a closing.

When the individual items that pass through the process are readily identifiable (an individual item being manufactured; a specific mortgage application), and the processing steps or milestones are predicable (the manufacturing process stages, the mortgage approval phases), its status can be tracked by an accumulating snapshot model. Such a model provides a way to understand the status of an individual item at the detail level, the workload at each processing step, and the efficiency with which items move from one stage to the next.

Consider the process by which a mortgage application moves from initial submission to closing. The process begins when an applicant completes a mortgage application with a mortgage officer. A processor then takes over, ensuring that documentation required to evaluate the loan application is gathered. Next, the application is reviewed by an underwriter, who reviews the application and supporting documentation, evaluates potential risk, and approves or rejects the application. The amount approved for underwriting may be less than the amount of the original application. Finally, the mortgage becomes legally binding at the time of closing.

Figure 1, below, shows an accumulating snapshot for the mortgage application process. The grain of this fact table is an application. Each application will be represented by a single row in the fact table. The major milestones are represented by multiple foreign key references to the Day dimension—the date of submission, the date approved by mortgage officer, the date all supporting documentation was complete, the date approved by an underwriter, and the date of closing.



Figure 1: An accumulating snapshot tracking mortgage applications.

Note that it is not necessary to build five Day dimension tables. A single Day table can serve in all these roles; it will simply be aliased at query time or through a series of views. Similarly, the three employees involved in the process are represented by three foreign key references to the Employee dimension table—one for the mortgage officer, one for the processor that assembles the supporting materials, and one for the underwriter.

The facts include several dollar amounts. The amount of the original application is the application_amount. The amounts approved and rejected by the mortgage officer and underwriter are recorded as officer_approved_amount and underwritten_amount. The amount actually loaned at closing is the closing_amount.

When the application is first submitted, the only date that is known is the application submission date. The appropriate day_key value is assigned to the application_submission_date_key. In order to accommodate the dates that are as yet unknown, the Day dimension must contain a row for "unknown" or "n/a." The key value for this row will be assigned to the remaining dates. The application_amount is set to the requested mortgage amount from the application. The remaining dollar amounts are initialized to zero.

As the application works its way through the process, the fact table row will be updated. The various date keys are revised as each milestone is reached. The additional facts are filled in as the appropriate milestones are reached.

The fact table also contains a series of lag columns. These represent the number of days the application spends at each stage. Each is a simple calculation based on milestone dates, populated once it completes the appropriate stage. They allow measurement of processing time.

The fact table also contains a series of quantities, each corresponding to one of the milestones. The value of each count is zero or one, depending on whether the application has reached the corresponding milestone. These facilitate workload reporting, and will take on other values when aggregated. To make it easier to identify the current status of an application, a status attribute may be added to the application dimension or as a degenerate dimension in the fact table.

Using this schema, it is possible to study the lifecycle of a particular application, the workload at each processing step, and the efficiency of the overall process itself.


Excerpted from Mastering Data Warehouse Aggregates, by Chris Adamson.
Copyright (c) 2006 by Wiley Publishing, Inc
Used by permission.