Chris Adamson’s Blog: Strategy

Showing posts with label Strategy. Show all posts

Sunday, April 24, 2016

Chris Adamson on Modeling Challenges

In a recent interview, the folks at WhereScape asked me some questions about data modeling challenges.

In Business Intelligence, modeling is a social activity. You cannot design a good model alone. You have to go out and talk to people.

As a modeler, your job is to facilitate consensus among all interested parties. Your models need to reflect business needs first and foremost. They must also balance a variety of other concerns — including program objectives, the behavior of your reporting and visualization tools, your data integration tools, and your DBMS.

It’s also important to understand what information resources are available. You need to verify that it is possible to fill the model with actual enterprise data. This means you need to profile and understand potential data sources. If you don’t consider sources of data, your designs are nothing more than wishful thinking.

When considering a non-relational data sources, resist the urge to impose structure before you explore it. You’ve got to understand the data before you spend time building a model around it.

Check out the video above, where I discuss these and other topics. For a full-sized version, visit the WhereScape page.

Friday, March 20, 2015

BI and the Path to Business Value

Posted by Chris Adamson

Managing BI services requires a consistent information architecture, even if different teams are responsible for data marts, performance management, and analytics.

Business Value From BI

Business Intelligence is the use of information to improve business performance.^[1] To improve business performance, we must do three things:

Track business performance
Analyze business performance
Impact business performance

Each step on the path to business value is supported by two kinds of BI services, as shown in the illustration.

Tracking performance requires understanding what is currently happening (Performance Management) and what has happened in the past (OLAP).
Analyzing performance requires the ability to get to detail (OLAP), develop insight into cause and effect (Business Analytics).
Impacting performance requires targeting a business metric (Performance Management) and taking a prescribed course of action (Business Analytics.)

Each of these steps leverages a pair of BI services, and each service shares a common interest in business information.^[2]Managing BI services therefore requires a consistent information architecture. This is true even when separate teams manage each area.

Tracking Performance

Understanding performance often starts with summarized data on dashboards and scorecards (Performance Management). The need investigate potential problems requires detailed data and history (OLAP and Reporting.)

As Wayne Eckerson demonstrated in Performance Dashboards, both these areas provide stronger business value when they are integrated. For example, a dashboard is more useful when someone can click a metric and bring up supporting detail in an OLAP cube.

To successfully link Performance Management and OLAP, the two domains must share common definitions for business metrics (facts) and associated reference data (dimensions). Metrics must be calculate in the same way, linked to reference data and different levels of detail, and synchronized (if managed separately).

Analyzing Performance

Analyzing performance is the process of breaking down what has occurred in an attempt to understand it better. Slicing and dicing an OLAP cube is a form of analysis, providing insight through detail. Analytic models provide a deeper level of analysis, providing insight into cause and effect, and extending this to the future through prediction.

OLAP is largely focused on exploring various aggregations of business metrics, while analytics is largely focused on the underlying detail that surrounds them. Our OLAP solutions provide historic detail to Business Analytics in the form of data from the data warehouse.^[3]

The exchange flows the opposite direction as well. Business analytics develop insights that suggest other things that should be tracked by OLAP services. For example, a particular set of behaviors may define a high value customer. This assessment is developed using Business Analytics, and applied to the customers in the OLAP data mart. For a fun example from the world of sports, check out the book Moneyballby Michael Lewis.^[4]

Improving Performance

All of this is somewhat academic if people in the business do not use all this information to make decisions. Business impact occurs at the intersection of Performance Management (which tells us what is important and how we are doing) and Analytics (which suggests the best course of action.)

Every analytic model targets a business metric or key performance indicator (KPI) from the world of performance management. That same KPI, in turn, can be used to measure return on investment of the analytic model.

For example, a direct sales manager of enterprise software wants to reduce cost of sales. An analytic model is developed that assesses the likelihood of a prospect to buy enterprise software.

The manager begins using the prospect assessment model to prioritize the work of the sales team. Less likely prospects are reassigned to a telesales force. Over the next two quarters, cost of sales starts falling. The same KPI that the analytic model targeted is used to measure its return on investment.

Information as an Asset

It is common to manage each of the pillars of Modern BI as a separate program. The path to business value, however, requires that these programs share a consistent view of business information. BI programs that are not centralized must carefully coordinate around a common information architecture.

Further Reading

For more on this definition of business intelligence, see Business Intelligence in the Modern Era (9/8/2014)
The three service areas are explained in The Three Pillars of Modern BI (2/9/2015).
Sometimes analytic modelers bypass the data warehouse, but there are steps you can take to make this important repository more useful. For tips on how to make your data warehouse more useful to analytic modelers, see Optimizing Warehouse Data for Business Analytics (9/25/13). Note that even with a well designed data warehouse, analytic models often augment this enterprise data with additional data sources.
The Oakland A's used analytics to re-evaluate the basic metrics used to assess the value of a baseball player. See Business Analytics and Dimensional Data (7/17/13).

** Interested in learning more about modern BI programs? **

Check out my new course, Business Information and Modern BI: Evolving Beyond the Dimensional Data Mart. Offered at TDWI conferences and onsite. See the sidebar for upcoming dates.

Monday, February 9, 2015

The Three Pillars of Modern BI

Posted by Chris Adamson

Data marts are no longer sufficient to meet the demands of a modern BI program. This post lays out a framework for delivering BI value in the modern era.

The technologies and processes that help us deliver BI services have advanced by leaps and bounds over the last two decades. A modern BI program provides three perspectives on business performance, roughly corresponding to the past, present, and future.

OLAP and Reporting

OLAP and reporting services (or simply "OLAP") provide the "official record" of what has happened in the past--the canonical log of business activity.

This pillar of the modern BI program helps the business understand "where we've been." The typical information products provided in this service area include:

Reports provide pre-built, parameterized access to business information
Analysis provides the ability to explore the official record of business activity by slicing, dicing, drilling, and so forth (OLAP)
Ad hoc query capabilities allow people to ask their own questions about the official record, even if a pre-defined report or analysis does not exist.

For people in the business, these kinds of information products come to define this pillar of the BI program. There is also a fourth important information product of which the business may have less direct awareness:

The integrated record of business activities, aka "Data Marts." This record combines, standardizes and organizes information for business consumption.

Essential in delivering the first three kinds of information products, this component was the primary focus in the early years of BI, when we called the practice "data warehousing." Since then, the discipline has changed and expanded. But it is still essential that the BI program provide the ability to understand the past.

Performance management

Performance management services provide real-time status on key performance indicators, as well as performance versus goals.

KPI's and goals are carefully matched to the viewer's role and linked to business objectives. Goals communicate expectations, while KPI's communicate achievement of expectations.

If OLAP is about "where we have been," then performance management is about "where we are now." Typical information products in this BI service area include:

Dashboards provide real-time or near-real-time status of KPI's
Scorecards which communicate progress vs. goals

Information on dashboards and scorecards is carefully tailored for the user or functional area. Metrics are chosen for relevance and actionability, linked to business strategy, and balanced to reflect a holistic picture of performance.

While this service area can stand on its own, performance management solutions are more powerful when people can dig into the KPI's on their dashboards. This capability is enabled by integrating performance management services with OLAP services.

Business analytics

Analytic services probe deeply into data, providing insight into cause and effect, making predictions about what will happen in the future, and prescribing a course of action.

While analytic services draw on data from the past, their objective is to influence the future. Typical information products in this service area include:

Analytic models that make sense of activities or predict future events
Simulations that allow the manipulation of variables to study their potential impact on results
Visualizations that communicate analytic insights
Analytic metrics that assess current state and or future outcomes which are fed to OLAP, performance management, and OLTP applications

Like the other pillars of modern BI, analytic services can exist alone but are more powerful in the presence of the other pillars. Prescriptive metrics, for example, are best presented directly on operational dashboards; useful analytic metrics can be recorded and tracked over time in data marts.

Delivering Modern BI

In each area of the business, these capabilities should be balanced and tied together. Centralized management of all three pillars is not required, but they should be coordinated and integrated. A shared roadmap should lay out their planned evolution.

Your objective is business impact, and my next post shows how these services deliver it.

Learn More

BI and the Path to Business Value (3/20/2015) explores how the three pillars of modern BI enable business impact.
Business Intelligence in the Modern Era (9/8/2014) provides a definition of Business Intelligence for modern information assets.

For more on managing the Modern BI program, check out Chris's latest course: Business Information and Modern BI. Check the sidebar for upcoming dates.

Thursday, September 18, 2014

Business Intelligence in the Modern Era

Posted by Chris Adamson

This post offers an updated definition for BI, and suggests that you don't have to think about it as a box on an org chart.

BI has changed a lot in the last two decades. Technologies and best practices have evolved, and we've found more ways in which a BI program can deliver value. Some of these innovations have occurred outside of IT or the BI Competency Centers that many businesses have established. At the same time, many organizations are moving to make business units autonomous.

These changes lead many people to ask what exactly is BI? Is it a box on the org chart? Does it include analytics that were never done by IT? How do data governance and master data management fit in?

Business Intelligence Defined

I define BI as follows:

Business Intelligence:

The use of information to improve business performance

- Chris Adamson

The first thing to note about this definition is that it does not address any specific technologies or methods. These aspects change over time, and they certainly influence what we may be able to achieve. But the objective is always to provide business value.

Secondly, note that this definition is not beholden to the boundaries of a departmental structure. Regardless of who develops, supports or uses solutions, it's all considered BI.

Let's take a quick look at both these aspects.

BI Services and Activities

The reason we commit resources to BI programs is simple: we intend to use information to deliver some kind of business value. The definition has been crafted to cover any activities that support this objective. It can be used to describe a variety of activities that provide business value, both old and new.

Among the older activities it covers:

Traditional reporting, OLAP and ad hoc functions
Dashboards and scorecards
Traditional data warehouses and/or data marts
Data integration services

At the same time, some newer uses of information are covered:

Business analytics and predictive analytic
Master data management
Data governance
Virtualization and federation services

The definition also covers activities that some people think of as on "the other side of the fence" from BI:

Transaction processing

That's intentional; transaction processing manufactures much of the "raw material" that BI programs attempt to leverage. When we plan an operational solution, we should be thinking about these downstream uses.

BI and the Org Chart

While you may have a group responsible for BI program management, it is important to understand that the scope of BI reaches well beyond this group. The delivery of business benefit from information impacts the entire organization.

Some of the functional areas that participate in BI are:

Business units All of the value from BI happens within business areas that use information. This is where decisions are made and impacts are realized. For many businesses, responsibility for development of BI solutions also lies in business areas. This is particularly the case for analytics, but also increasingly for the traditional forms of BI.
BI Competency Centers Whether part of IT or external to it, many organizations have established a centralized resource for planning and overseeing the development of traditional forms of BI, such as data marts, dashboards or scorecards. In some cases, these centers have become focused on providing advisory services to business units that create and manage their own solutions.
Analytic Competency Centers Business analytics often begins within business areas such as marketing or risk management. Analytic competency centers are developed to help other areas of the business leverage information in a similar manner. Whether part of the BI competency center or distinct from it, this is also a core BI function.
IT At a minimum, IT has some responsibility for the technical infrastructure on top of which information systems are built -- networks, computers and the services that keep them up and running. IT may also have responsibility for some of the business applications and data management solutions.

Regardless of how your organization structure divvies up these responsibilities, BI is the sum total of these activities, and not the domain of a particular group or department. A business strategy to create value through information cuts across many departments. It cannot be planned or executed in isolation.

The Future of BI

We're not far from an age where BI is not a separate part of our information architecture. We're not there yet, but several trends have us on this path:

Focus on the future value and re-use of data managed by operational applications
Commitment to data governance
Maturation of master data management solutions
Technological advances in data management and information access

When we finally arrive at a unified information architecture, the definition of BI will still hold. We will be closer to delivering on its promise than ever before.

And, without a doubt, we will have come up with ways of using information to deliver value that have not even be thought of today.

Thursday, November 14, 2013

Facebook's Ken Rudin on Analytics

Posted by Chris Adamson

If you are interested in how business analytics impact your BI program, carve out forty-five minutes of time to watch Ken Rudin's recent TDWI keynote: "Big Data, Bigger Impact." The video is embedded below.

Rudin is the director of analytics at Facebook. In his presentation, he discusses several topics that are of interest to readers of this blog. Among them:

Big data technology should be used to extend your traditional BI solution, not replace it. Facebook has realized this, and is working to bring in relational technology to answer traditional business questions.

Successful analytics programs bring together centrally managed core data metrics with a variety of data that is not centrally managed. Rudin shares different ways he has been able to make this happen.

A similar balance can be attained with your organizational structure. Use of "embedded analysts" provides the business benefits of decentralization, while maintaining the efficiencies and scale advantages of a centralized program.

These are just a few of the points made during his talk. If you don't have the time to watch it now, bookmark this page for later.

You'll also want to check out Wayne Eckerson's latest book, Secrets of Analytical Leaders. (Details below.)

Big Data, Bigger Impact

Ken Rudin

TDWI World Conference, Chicago 5/6/2013

Recommended Reading

Wayne Eckerson's excellent book, Secrets of Analytical Leaders,features more insights from Ken Rudin and others.

I highly recommend this book if you are interested in analytics.

Get it from Amazon.com in paperback or Kindle editions.

Wednesday, June 5, 2013

In the Era of Big Data, The Dimensional Model is Essential

Posted by Chris Adamson

Don't let the hype around big data lead you to believe your BI program is obsolete.

I receive a lot of questions about "big data." Here is one:

We have been doing data warehousing using Kimball method and dimensional modeling for several years and are very successful (thanks for your 3 books, btw). However, these days we hear a lot about Big Data Analytics, and people say that Big Data is the future trend of BI, and that it will replace data warehousing, etc.

Personally I don't believe that Big Data is going to replace Data Warehousing but I guess that it may still bring certain value to BI. I'm wondering if you could share some thoughts.

"Big data" is the never-ending quest to expand the ways in which our BI programs deliver business value.

As we expand the scope of what we deliver to the business, we must be able to tie our discoveries back to business metrics and measure the impact of our decisions. The dimensional model is the glue that allows us to achieve this.

Unless you plan to stop measuring your business, the dimensional model will remain essential to your BI program. The data warehouse remains relevant as a means to instantiate the information that supports this model. Reports of its death have been greatly exaggerated.

Big Data

"Big Data" is usually defined as a set of data management challenges known as "the three V's" -- volume, velocity and variety. These challenges are not new. Doug Laney first wrote about the three V's in 2001 -- twelve years ago.¹And even before that, we were dealing with these problems.

Photo from NASA in public domain.

Consider the first edition of The Data Warehouse Toolkit, published by Ralph Kimball in 1996.² For many readers, his "grocery store" example provided their first exposure to the star schema. This schema captured aggregated data! The 21 GB fact table was a daily summary of sales, not a detailed record of point-of-sale transactions. Such a data set was presumably too large at the time.

That's volume, the first V, circa 1996.

In the same era, we were also dealing with velocity and variety. Many organizations were moving from monthly, weekly or daily batch loads to real-time or near-real time loads. Some were also working to establish linkages between dimensional data and information stored in document repositories.

New business questions

As technology evolves, we are able to address an ever expanding set of business questions.

Today, it is not unreasonable to expect the grocery store's data warehouse to have a record for every product that moves across the checkout scanner, measured in terabytes rather than gigabytes. With this level of detail, market basket analysis is possible, along with longitudinal study of customer behavior.

But of course, the grocery store is now looking beyond sales to new analytic possibilities. These include tracking the movement of product through the supply and distribution process, capturing interaction behavior of on-line shoppers, and studying consumer sentiment.

We still measure our businesses

What does this mean for the dimensional model? As I've posted before, a dimensional model represents how we measure the business. That's not something we're going to stop doing. Traditional business questions remain relevant, and the information that supports them is the core of our BI solution.

At the same time, we need to be able to link this information to other types of data. For a variety of reasons (V-V-V), some of this information may not be stored in a relational format, and some may not be a part of the data warehouse.

Making sense of all this data requires placing it in the context of our business objectives and activities.

To do this, we must continue to understand and capture business metrics, record transaction identifiers, integrate around conformed dimensions, and maintain associated business keys. These are long established best practices of dimensional modeling.

By applying these dimensional techniques, we can (1) link insights from our analytics to business objectives and (2) measure the impact of resultant business decisions. If we don't do this, our big data analytics become a modern-day equivalent of the stove-pipe data mart.

The data warehouse

The function of the data warehouse is to instantiate the data that supports measurement of the business. The dimensional model can be used toward this aim (think: star schema, cube.)

The dimensional model also has other functions. It is used to express information requirements, to guide program scope, and to communicate with the business. Technology may eventually get us to a point where we can jettison the data warehouse on an enterprise scale,³but these other functions will remain essential. In fact, their importance becomes elevated.

In any architecture that moves away from physically integrated data, we need a framework that allows us to bring that data together with semantic consistency. This is one of the key functions of the dimensional model.

The dimensional model is the glue that is used to assemble business information from distributed data.

Organizations that leverage a bus architecture already understand this. They routinely bring together information from separate physical data marts, a process supported by the dimensional principle of conformance. Wholesale elimination of the data warehouse takes things one step further.

Notes

Doug Laney's first published treatment of "The Three V's" can be found on his blog.
Now out of print, this discussion appeared in Chapter 2, "The Grocery Store." Insight into the big data challenges of 1996 can be found in Chapter 17, "The Future."
I think we are a long time away from being able to do this on an enterprise scale. When we do get there, it will be as much due to master data management as it is due to big data or virtualization technologies. I'll discuss virtualization in some future posts.

More reading

Previous posts have dealt with this topic.

The Role of the Dimensional Model in Your BI Program (4/30/2013) details the four ways we use the dimensional model. Only one of these functions involves a database.

In Big Data and Dimensional Modeling (4/20/2012) you can see me discuss the impact of new technologies on the data warehouse and the importance of the dimensional model.

Tuesday, April 30, 2013

The Role of the Dimensional Model in Your BI Program

Posted by Chris Adamson

The dimensional model delivers value long before a database is designed or built, and even when no data is ever stored dimensionally. While it is best known as a basis for database design, its other roles may have more important impacts on your BI program.

The dimensional model plays four key roles in Business Intelligence:

The dimensional model is the ideal way define requirements, because it describes how the business is measured
The dimensional model is ideal for managing scope because it communicates to business people (functionality) and technical people (complexity)
The dimensional model is ideal as a basis for data mart design because it provides ease of use and high performance
The dimensional model is ideal as a semantic layer because it communicates in business terms

Information Requirements

The dimensional model is best understood as an information model, rather than data model. It describes business activities the same way people do: as a system of measurement. This makes it the ideal form to express information needs, regardless of how information will be stored.

Image by Gravityx9
licensed under Creative Commons 2.0

A dimensional model defines business metrics or performance indicators in detail, and captures the attendant dimensional context. (For a refresher, see the post What is a Dimensional Model from 4/27/2010.) Metrics are grouped based on shared granularity, cross referenced to shared reference data, and traced to data sources.

This representation is valuable because business questions are constantly changing. If you simply state them, you produce a model with limited shelf life. If you model answers to the question of today, you've provided perishable goods.

A dimensional model establishes information requirements that endure, even as questions change. It provides a strong foundation for multiple facets of BI:

Performance management, including dashboards and scorecards
Analytic processing, including OLAP and ad hoc analysis
Reporting, including both enterprise and operational reports
Advanced analytics, including business analytics, data mining and predictive analytics

All these disciplines center on business metrics. It should be no surprise that when Howard Dresner coined the term Business Intelligence, his definition referenced "facts and fact based systems." It's all about measurement.

Program Roadmap and Project Scope

A dimensional model can be used to describe scope because it communicates to two important audiences.

Business people: functionality The dimensional model describes the measurement of a business process, reflecting how the process is evaluated by participants and observers. It communicates business capability.

Technical personnel: level of effort A dimensional model has technical implications: it determines the data sources that must be integrated, how information must be integrated and cleansed, and how queries or reports can be built. In this respect, it communicates level of effort.

These dual perspectives make the dimensional design an ideal centerpiece for managing the roadmap for your BI program. Fully documented and mapped to data sources, a dimensional model can be divided into projects and prioritized. It is a blueprint that can be understood by all interested parties. A simple conformance matrix communicates both intended functionality and technical level of effort for each project.

At the project level, a dimensional design can be used as the basis for progress reporting. It can also serve as a nonambiguous arbiter of change requests. Changes that add data sources or impact grain, for example, are considered out of scope. This is particularly useful for organizations that employ iterative methodologies, but its simplicity makes it easy to reconcile with any development methodology.

Database Design

The dimensional model is best known as the basis for database design. The term "star schema" is far more widely recognized than "dimensional model" (a fact that influenced the name of my most recent book).

In fact, the dimensional model is the de facto standard for data mart design, and many organizations use it to shape the entire data warehouse. It has an important place in the W.H. Inmon's Corporate Information Factory, Ralph Kimball Dimensional Bus architecture, and even in one-off data marts that lack an enterprise focus.

Implemented in a relational database, the dimensional model becomes known as a star schema or snowflake. Implemented in a multidimensional database, it is known as a cube. These implementations offer numerous benefits. They are:

Easily understandable by business people
Extraordinarily flexible from a reporting and analysis perspective
Adaptable to change
Capable of very high performance

Presentation and the Semantic Layer

A dimensional representation is the ideal way to present information to business people, regardless of how it is actually stored. It reflects how people think about the business, so it is used to organize the catalog of items they can call on for analysis.

Many business intelligence tools are architected around this concept, allowing a semantic layer to sit between the user and database tables. The elements with which people can frame questions are categorized as facts and dimensions. One need not know what physical data structures lay beneath.

Even the earliest incarnations of the semantic layer leveraged this notion. Many organizations used these tools to impose a dimensional view directly on top of operational data. Today, semantic layers are commonly linked to dimensional data marts.

A dimensional representation of business activity is the starting point for a variety of BI activities:

Building enterprise reports
Defining performance dashboards
Performing ad hoc analysis
Preparing data for an analytic model

The concept of dimensional presentation is receiving renewed attention as federated solutions promise the construction of virtual solutions rather than physical ones.

Further information

I briefly covered these four roles in an interview last year:

Big Data and Dimensional Modeling (4/20/2012)

Many of these themes have been discussed previously:

Three Data Warehouse Architectures that Use Star Schema (3/26/2007)
Drive Warehouse Strategy with a Dimensional Model (7/6/2007)
What is a Dimensional Model? (4/27/2010)
The Conformance Matrix (6/5/2012)

Although I've touched on these topics before, I wanted to bring them together in a single article. In the coming months, I will refer back to these concepts as I address common questions about big data, agile BI and federation.

In the mean time, please help support this blog by picking up a copy of my latest book.

Friday, November 5, 2010

Q&A: Star vs. Snowflake

Posted by Chris Adamson

A question from Portugal gives me an excuse to talk about snowflakes, why you might want to avoid them, and why you might want to use them.

Q: In your perspective, when does a star schema start to be a snowflake? If you have a snowflaked dimension, do you consider the model a star-schema? Or if you have for example an outrigger, is the model not a star-schema anymore?

Pedro in Portugal
http://www.pedrocgd.blogspot.com/

I think the question when does a star become a snowflake is really one of semantics, but I'll give an answer.

Readers who just want to know when to use snowflake designs can skip ahead a couple of paragraphs.

Is it a Star or Snowflake?

As I said, this is really a semantic issue, but here is my answer. When dimension tables are linked to other dimension tables, or to anything that is not a fact table, the design is a snowflake.

The presence of an outrigger indicates a snowflake design, and so does a 3NF dimension containing 45 tables (and yes, I have seen that!).

Is a snowflake a star? I don't have a good answer for this. When I discuss design options with colleagues, we think of it as an either/or choice. But when a design is complete, we refer to each fact table and its dimensions as a "star," even if there are outriggers.

One thing I am sure of is that both types of design are dimensional designs. (So is a cube, by the way.)

Thanks to Pedro for the question. Now I'm going to talk a bit more about snowflakes, for those who are interested.

When in doubt, don't snowflake

Best practices dictate that snowflakes should be avoided. The reasons for this are wholly pragmatic.

They increase ETL complexity
The increase query complexity, harming "usability"
They reduce the understandability of the model

We could get into a discussion on this, but instead I'd like to look at some situations where these guidelines might not apply.

When snowflaking is acceptable

There are some cases the guideline against snowflaking may be relaxed.

Judicious use of outriggers may be acceptable in limited cases where there are relationships between dimensions that must be browsable. But consider a factless fact table.

Outriggers are necessary when there are repeating attributes such as a product with multiple features or a patient with multiple diagnoses.

Outriggers are helpful in situations where there is a recursive hierarchy such as departments that contain other departments, or regions that contain other regions.

Snowflaking may be justified when your software products require it. (e.g. your DBMS or reporting tool.)

In the latter case, you are making a strategic decision rather than a design decision, and I recommend involving several points of view in the decision making process--not just designers.

Read more, and help support this blog:

I devote many pages to snowflake designs in Star Schema The Complete Reference.

Chapter 7 discusses snowflakes and hierarchies, and has an in-depth discussion of some of the issues touched on in this post.
Chapter 9 discusses the use of outriggers and bridge tables to capture repeating dimensions or attributes.
Chapter 10 shows how you can use a bridged design to support a recursive relationship to very powerful effect.
Chapter 16 looks at some of the implications of the star vs. snowflake decision on your BI software

If you use the link above (or the links in the sidebar) to order my book, a portion of the proceeds helps to support this blog. You can also find the table of contents and other info right here.

Image credit: Gravityx9 licensed under Creative Commons 2.0

Wednesday, May 19, 2010

Kimball's Approach is Top-Down

Posted by Chris Adamson

Ralph Kimball's approach to data warehousing is frequently mis-characterized as being "bottom-up." This post aims to clear up that misconception.

Bus Architecture

Kimball's bus architecture (or dimensional data warehouse architecture) is an enterprise architecture. At its core, a set of conformed dimensions ensure a consistent representation of standard terms and data elements across multiple subject areas. The conformed dimensions describe important things like products, customers, locations, or anything of significance to the business.

The subject areas are called data marts. They represent things like manufacturing, sales, invoicing, receivables and so forth. Data marts don't need to be implemented all at once. They can be implemented one at a time, as part of an incremental program. Data marts also don't need to be stored in a single database (although they may.) When they are stored in different databases, the conformance bus ensures consistency and compatibility.

Top-Down

Kimball advocates planning a set of conformed dimensions as an up-front (i.e. strategic) activity. The conformance bus then serves as the blueprint for a set of integrated data marts, which can be built on whatever schedule makes the most sense.

Kimball and Ross put it this way:

During the limited-duration architecture phase, the team designs a master suite of standardized dimensions and facts that have uniform interpretation across the enterprise...We then tackle the implementation of separate data marts in which each iteration closely adheres to the architecture.

- From The Data Warehouse Toolkit, Second Edition
by Ralph Kimball and Margy Ross (Wiley, 2002)

Because it begins with an enterprise-level framework, then delivers departmental functionality, this is a top-down approach.

Bottom-Up

A bottom-up approach is one that moves in the opposite direction, beginning with a departmental focus and later evolving into one that has an enterprise focus. This occurs when organizations build stand-alone data marts, then later decide to integrate them.

Stand-alone data marts are designed and built for departmental use, without an enterprise context. They are cheaper in the short-run, offering a fast path to quick results. Stand-alone data marts also arrive due to mergers and acquisitions, or through packaged software.

When there is more than one stand-alone data mart, however, they are likely to exhibit incompatibilities and inconsistencies. They are sometimes labeled "stovepipes." Faced with these inconsistent data marts, some organizations resolve to retrofit them into a conformance framework. This can be a difficult and expensive process, requiring extensive rework.

When stand-alone data marts are successfully brought into conformance, a bottom-up path has been followed--one that starts with a departmental solution and moves to enterprise capability. Bottom-up development is cheaper in the short term but more expensive in the long term.

While the end result may be compatible with Kimball's vision, clearly the route is not. If this is news to you, you might want to check out his book. (The link appears beneath the quotation above.) You can also consult posts on data warehouse architectures and common misconceptions.

-- Chris

Image: PCI Slot by Ryan_Franklin_az

Licensed under Creative Commons 2.0

Monday, April 5, 2010

TDWI Members: Read Chris's Column in Flashpoint

Posted by Chris Adamson

If you are a member of The Data Warehousing Institute, be sure to check out the April 1, 2010 issue of Flashpoint.

My article The Hidden Value of Dimensional Design explains how you can use dimensional design to cultivate a shared understanding of project scope between business and technical personnel.

This publication is only available to TDWI members.

If you are not a member (or if you want to read more on the topic) have a look at this blog post I wrote in 2007: Drive Warehouse Strategy With A Dimensional Model.

- Chris

Thursday, November 19, 2009

The Tolkien Effect

Posted by Chris Adamson

Schema designers must be on the lookout for data elements that are known by more than one name. Equally common is the use of a single name to signify very different things.

It may surprise you to learn that there is an important connection between data warehouse design and The Lord of the Rings. In the books, author JRR Tolkien challenges readers by using many different names, without explanation, for the same character.

The character Aragorn, for example, is also known as Strider, Dunadan, the Heir of Isildur, and several other names and titles. Each name, it turns out, is associated with a different culture or point of view.

At first, all this can be deeply confusing. With a little effort and patience, however, things begin to make sense, and the end result can be deeply rewarding.

The Tolkien Effect in Business

The same kind of thing happens when gathering requirements for a dimensional model. Within a business, it is commonplace to find several names for the same thing. Different departments, for example, may refer to products in different ways. Even within a department, there may be multiple names for the same thing.

Depending on your previous experience with the area in question, it may take you some time to realize this is going on. I will never forget the day I realized that a finance group meant the same thing by Ten-Digit-Department, Level 3 Code and Budget Line.

It’s crucial to identify these situations, or the capabilities of your model will be crippled. Data elements of interest in multiple contexts should be given a single, shared definition in your model. For dimensions in particular, this will be crucial in supporting analysis that crosses subject areas.

These shared dimensions are called conformed dimensions, and they are the key to avoiding stove-pipe subject areas. Even within a subject area, this can be crucial. The Ten-Digit-Department realization was essential in permitting comparison of budgets to actuals.

The Reverse-Tolkien

The converse is also a commonplace: a single name used to signify very different things. The best example of this is “Sales.” A salesperson will often use this word to refer to an order or contract. In finance, however, the word is reserved for the event that allows the recognition or revenue, which is often fulfillment or shipment of the order.

Once again, it is crucial that analyst keep an eye out for these situations; failure to produce consistent and well defined definitions for each fact or measurement is also a failure of conformance. The result will be challenges to the accuracy of the data, distrust of the solution, and a lack of user adoption.

What You Can Do

How then to avoid these problems? Listen. Don’t assume you know what people mean. Repeat things back in your own words. Be sure to write down and review definitions of each data element.

Look out for The Tolkien Effect. Pay close attention to people who live and work on the cusp of two subject areas or departments, as they will be keenly aware of these kind of linguistic challenges. So will the data administrator, if your organization has one.

-Chris

Image Attribution: lrargerich
CC BY 2.0

Monday, July 27, 2009

Recommended Books on the Data Warehouse Lifecycle

Posted by Chris Adamson

Recommended Reading: A new book by Laura Reeves, and a revised edition of the classic Lifecycle Toolkit.

If you've been to any of my classes, you already know that I am a fan of Laura Reeves. She has a pragmatic, get-things-done approach to data warehousing.

You may also know her as co-author of the original edition of The Data Warehouse Lifecycle Toolkit, a book she wrote with Ralph Kimball, Margy Ross and Warren Thornthwaite. (For more on that book, see below.)

Laura has a new book out, which I highly recommend: A Manager's Guide to Data Warehousing.

In this book, she provides a practical guide to planning and executing data warehouse projects. It is written for managers (I.T. and business) who do not necessarily have a technical background in data warehousing.

Laura touches on each phase of the data warehouse lifecycle, providing useful advice without over-burdensome methodology, detailed task lists or the like. This makes it easy to fit her advice into your own organization's development style.

Even if you already have a strong background in dimensional design, you will find this book to be quite useful. You can get it at Amazon.com.

Also Recommended
If you have a dimensional data warehouse, I also urge you to check out The Data Warehouse Lifecycle Toolkit, Second Edition by Ralph Kimball, Margy Ross, Warren Thornthwaite, Joy Mundy and Bob Becker.

This fully revised version of the classic book contains detailed tasks and deliverables to help you manage all phases of the data warehouse lifecycle.

It is an excellent reference for data warehousing professionals. Read more about it at Amazon.com.

The original edition has been a long time recommendation on this blog, and the new edition carries on the standard. (Apologies to Warren Thornthwaite, whose name was previously misspelled here.)

Friday, July 6, 2007

Drive Warehouse Strategy with a Dimensional Model

Posted by Chris Adamson

CIO's and managers of successful data warehouses understand that a dimensional model is more than just a design-stage project deliverable. They use the dimensional model to drive data warehouse strategy, capture requirements, set project priorities, and manage project scope.

Fundamentally, a dimensional model deals with the measurement of business processes. It describes how a business process is evaluated, and can be used to frame questions about a process. In this respect, it speaks clearly to the business users of the data warehouse.

A dimensional model also has technical implications. Its definition determines the data sources that must be integrated, how information must be cleansed or standardized, and what queries or reports can be built. In this respect, it speaks clearly to the developers of the data warehouse.

These business and technical characteristics of the dimensional model make it an ideal focal point for managing the entire data warehouse life cycle. A dimensional model can serve as the basis for a shared understanding of warehouse strategy. From a business perspective, it imparts a clear understanding of functional capability; from an I.T. perspective, it supports a clear understanding of technical activity.

Warehouse Strategy It is well understood that planning a dimensional model at an enterprise level can enable the incremental implementation of subject area applications. For a Kimball-style dimensional data warehouse, this process is critical to ensuring that stove-pipes are not developed.

But a high-level dimensional model has additional value, even if you are not building a dimensional data warehouse. Because it provides a clear framework to describe functional capability and technical work-units, a dimensional model is an effective way to plan and document an enterprise data warehouse architecture.

I use the dimensional model as the central focus of data warehouse strategic plans. It is understood by business and technical constituents, bringing them together with a shared understanding of scope, functionality and technical effort for each subject area. It clearly conveys functionality, while at the same time allowing development activity to be quantified.
Requirements Definition In the same way, a dimensional model is an ideal way to capture requirements—whether at a strategic level, or for a subject-area implementation (or data mart.) Report requirements, data requirements, and loading requirements can all be expressed in dimensional terms. This makes each requirement understandable to a variety of audiences, and allows their dependencies to be easily cross-referenced.

I use a dimensional format to capture requirements and develop specifications for deliverables such as reports, applications, ETL routines, and, of course, schema design.

Project Prioritization A dimensional model can be cross referenced with business priorities, report functionality, data availability, load requirements, and several other factors that drive a development roadmap. As a common ground for business and technical interaction, it is an invaluable tool.

Once priorities are set, the dimensional framework can be used to describe them. By segmenting a dimensional model into a set of sequenced projects, it clearly links both functionality and technical effort to the calendar. At the same time, it enables analysis of resource requirements that will be required.

Manging Scope The dimensional model is an ideal way to define and manage the scope of projects. I use dimensional terms to describe project objectives, what is considered in and out of project scope, and how change requests will be evaluated by project leadership.

This is particularly useful for implementation projects that employ an "iterative" build process, where scope-creep is an ever-present possibility. By linking project scope to dimensional characteristics (such as grain, sources, or attribution), physical design can be enhanced through iterations without allowing the project to spiral out of control.

A dimensional focus can characterize a variety of deliverables, regardless of the type of architecture your data warehouse employs. Strategy documents, report requirement definitions, project definitions, load specifications... A unified dimensional approach links all these deliverables together, and expresses important information in a way that provides value to business and technical audiences.

© 2007 Chris Adamson

Classes

Chris is scheduled to present at the following events. Course enrollment is open to the general public.

All these courses are also available on site (see below).

August 18, 2019
San Diego, CA
Data Modeling in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI San Diego
August 20, 2019
San Diego, CA
Workshop: Building the Business Case for Advanced Analytics
Registration: TDWI San Diego Strategy Summit
Monday October 21, 2019
San Francisco, CA
TDWI Dimensional Data Modeling Primer: From Requirements to Business Analysis
Registration: TDWI Seminars
Tuesday October 22, 2019
San Francisco, CA
Advanced Dimensional Modeling: Techniques for Practitioners
Registration: TDWI Seminars
Wednesday October 23, 2019
San Francisco, CA
Dimensional Models: What’s New in the Big Data Era
Registration: TDWI Seminars
November 12, 2019
Orlando, FL
Data Architecture: Managing Information in the Age of Big Data
Registration: TDWI Orlando
November 12, 2019
Orlando, FL
The Dimensional Model Refactored: New Techniques for the 21st Century
Registration: TDWI Orlando
November 15, 2019
Orlando, FL
Advanced Dimensional Modeling: Complete Tour of Modern Best Practices
Registration: TDWI Orlando

Onsite Education

You can bring Chris to your team for interactive education.

Dimensional Modeling

Chris provides full-day and expanded two-day courses covering the dimensional design concepts from Star Schema: The Complete Reference.
TDWI Courses

Chris teaches select TDWI courses that cover topics like data BI fundamentals, performance management, business analytics, dashboards and scorecards, and more.

All of Chris's education offerings are provided through TDWI.

For information on onsite offerings, contact TDWI Onsite Education. or Oakton Software