Building a financial data mart? Don't let the multiple dates associated with accounting transactions confuse you.
A standard feature of financial systems often befuddles star schema designers. When someone inputs a transaction into a financial system, the transaction has at least two associated dates. One represents the date that the event happened, and the second represents the date the transaction was entered into the system. The first is usually called the transaction date, while the second is called the date posted. Sometimes, there are more dates as well.
For example, if a check is written, the record in the accounting system will have two dates. The transaction date will be the date on the check. The date posted will be the day that someone in the accounting department actually entered the transaction into the system. (Some financial systems may have other dates as well, such as an audit date or an edited date that applies to comments.)
Don’t let these dates confuse you. If you are designing a financial data mart, you do not need to choose between them; both dates will be useful, and both should appear in your model. A fact table that logs general ledger activity, for example, will have one row per transaction. That row can contain two references to a day dimension; one represents the transaction date and one represents the posting date. Name the keys appropriately. For example, you can call them day_key_transaction, day_key_posted.
The transaction date will be the one that is used to drive most reporting. This is the date that properly associates the transaction with an accounting period. You would use it to summarize transactions by day, month or fiscal period, for example.
The date posted will be of lesser use, but may also occasionally serve analytic purposes--especially when compared to the transaction date. For example, auditors may want to look for transactions posted one or more months after they actually took place. Managers might want to improve their efficiency by studying lags between transaction date and date posted.
If you are also going to build a monthly snapshot by account, then the star for account transactions may serve as its source. In that case, the date posted will help the ETL developers maintaining the snapshot. During each processing window, they can simply look for any transactions that were posted prior to the last time the snapshot was updated, and process only those rows.
- Chris
Monday, February 16, 2009
Q&A: Chosing the right time dimension for a Snapshot
Posted by
Chris Adamson
In this Q&A, a reader asks how to designate a time dimension for a fact table that is a snapshot.
First, I want to commend you for stating the grain of your fact table, and stating it clearly. Grain confusion can cause many problems, and makes a question like yours even more difficult to answer. Snapshots always have a time component to their grain; in your case it is monthly. Your fact table will have one row per account per month.
Now to your question: I strongly prefer the use of a month dimension for a monthly snapshot. This approach is non-ambiguous. It is self-evident what each row represents; the chance for any confusion is minimal.
When a monthly snapshot is designed with a day dimension, the designers are generally clear on what each row represents. However, report developers or other users may misinterpret the data now or in the future.
Designers who use a day dimension with a monthly snapshot generally choose the last day of the period represented, as you suggested. The problem with this is that there are many elements of a typical day dimension that simply do not apply to a monthly summary. Tor example, a snapshot representing March 2009 would have the day 3/31/09 associated with it. The day dimension has a lot of information about March 31, 2009 that does not apply to the snapshot, such as what day of the week it was, whether it was a holiday, and so forth. It would be all too easy to misinterpret this information as being applicable to the facts.
Even if everyone is clear on this at the start, a year or two down the road there are likely to be new members of your development team who may misinterpret the data. The presence of a date can also lead to confusion about what a row represents. If your monthly summary
contains an average daily balance, for example, a user may confuse this for a balance as of the day in question.
Confusion will also arise over what date to use for the current period. During March 2009, you may be updating the snapshot rows for each account on a daily basis. Using a day dimension, ETL developers may assign the particular date in March, which now serves double duty -- designating the snapshot period and the date loaded.
Sometimes, there is a strong desire among technical team members to use a day dimension with a monthly snapshot. If you cannot convince them otherwise, then you can avoid some confusion by creating a view for use with the monthly snapshot that hides the day-level attributes. You may be able to do the same kind of “hiding” of non-applicable dimension attributes within your BI tool. Use the month-end date for each fact table row. If the ETL team wants to track the day each row was loaded, they can do this with a second relationship to the day dimension. (This trick works nicely even if you do go with a month dimension.)
Regardless of your approach, I also advise you to name the facts carefully. If you want to include a balance as of the end of the period, name it period_end_balance. An average daily balance for the period in question should be called average_daily_balance. A column name like balance is simply too vague, and bound to be misused.
Thanks for your question!
- Chris
If you have a question, click my profile picture at the top of the page for my address. I try to respond to all messages, and may even post an answer here.
Q: We are designing a new star. The fact table is a monthly snapshot. It will capture several metrics each month for each open account. My question is how to handle the time dimension. We cannot agree whether it makes more sense to have a month dimension, or to use a month end date from a day dimension. What makes the most sense?
- Name withheld by request
First, I want to commend you for stating the grain of your fact table, and stating it clearly. Grain confusion can cause many problems, and makes a question like yours even more difficult to answer. Snapshots always have a time component to their grain; in your case it is monthly. Your fact table will have one row per account per month.
Now to your question: I strongly prefer the use of a month dimension for a monthly snapshot. This approach is non-ambiguous. It is self-evident what each row represents; the chance for any confusion is minimal.
When a monthly snapshot is designed with a day dimension, the designers are generally clear on what each row represents. However, report developers or other users may misinterpret the data now or in the future.
Designers who use a day dimension with a monthly snapshot generally choose the last day of the period represented, as you suggested. The problem with this is that there are many elements of a typical day dimension that simply do not apply to a monthly summary. Tor example, a snapshot representing March 2009 would have the day 3/31/09 associated with it. The day dimension has a lot of information about March 31, 2009 that does not apply to the snapshot, such as what day of the week it was, whether it was a holiday, and so forth. It would be all too easy to misinterpret this information as being applicable to the facts.
Even if everyone is clear on this at the start, a year or two down the road there are likely to be new members of your development team who may misinterpret the data. The presence of a date can also lead to confusion about what a row represents. If your monthly summary
contains an average daily balance, for example, a user may confuse this for a balance as of the day in question.
Confusion will also arise over what date to use for the current period. During March 2009, you may be updating the snapshot rows for each account on a daily basis. Using a day dimension, ETL developers may assign the particular date in March, which now serves double duty -- designating the snapshot period and the date loaded.
Sometimes, there is a strong desire among technical team members to use a day dimension with a monthly snapshot. If you cannot convince them otherwise, then you can avoid some confusion by creating a view for use with the monthly snapshot that hides the day-level attributes. You may be able to do the same kind of “hiding” of non-applicable dimension attributes within your BI tool. Use the month-end date for each fact table row. If the ETL team wants to track the day each row was loaded, they can do this with a second relationship to the day dimension. (This trick works nicely even if you do go with a month dimension.)
Regardless of your approach, I also advise you to name the facts carefully. If you want to include a balance as of the end of the period, name it period_end_balance. An average daily balance for the period in question should be called average_daily_balance. A column name like balance is simply too vague, and bound to be misused.
Thanks for your question!
- Chris
If you have a question, click my profile picture at the top of the page for my address. I try to respond to all messages, and may even post an answer here.
Subscribe to:
Posts (Atom)