How to Keep the Enterprise Data Warehouse Relevant


Winston Chen

Last week, a data architect from a large pharma asked me: “Should we build an enterprise data warehouse given that we want to harmonize business processes globally?” My first reaction was: it’s been a while since I heard anyone wanting to build a true EDW.

Why? Is it because of the Great Recession during which companies avoided big and risky projects? Or, is there something else going on?

Let’s look at why data warehouses exist in the first place. We build data warehouses for 3 reasons:

1). We want to hoard data. Transactional systems purge older records. Master data records are usually updated in place, so the previous versions are gone forever.  So, we build data warehouses to store historical data in case we need to analyze it later. 

2). We want good performance for data analysis. Transactional systems are not optimized for analytical queries. Plus, analytical queries can hog a database and impact operations. So we build data warehouses both to take the load away from transactional systems and to design and tune the database purposefully for handling analytical queries.

3). We want good data. Data from many source systems need to be integrated and cleansed to meet the demands of analysis. This is the “one version of the truth” rationale that many of us have a love-hate relationship with. At the end of the day, this is data management, which aims to achieve both “one version” (consistency) and “the truth” (accuracy), plus “completeness”.

It is third goal, data management, that puts the “E” in EDW, making it different from a data mart or just a plain data warehouse.

Since the data warehousing wave started, dedicated analytical databases have been able to dramatically improve our ability to hoard data (first goal) through high storage density, and to run analytical queries fast (second goal). Progress in database technology will continue to be rapid because competition is fierce.

Meanwhile, MDM and data governance are helping us meet the third goal of data management. Multi-domain MDM promises realize the fabled “conformed dimension”. And data governance, which can define and enforce data policies for quality across the data landscape, promises to affect “the truth everywhere”. In this context, enterprise architects are surely rethinking the old data mart versus data warehouse and EDW debate. If source data — aided by MDM and data governance — are consistent and accurate, then data marts forged out of inexpensive database appliances wouldn’t be bad, would they? And, data marts have a reputation of being agile and more responsive to business needs. In other words, will MDM and data governance make EDWs irrelevant?

Theoretically, yes. But EDWs are not going away anytime soon. Like mainframes, EDWs are deeply entrenched in the IT infrastructure and nearly impossible to sunset. In addition, due to their breadth and versatility, EDWs are still the only viable solutions to many hard problems. New EDWs will continue to get built, albeit at a slower pace.

There is another way for the EDW to get more wind behind its sails: fully and completely embrace data management and data governance. This how an EDW differentiates itself from a data mart-appliance with a  narrower scope. And, the EDW is in a great position to be the poster child for data governance, and in doing so, it can keep its status as the center of gravity for all things data in an enterprise.

What do you think? I welcome your comments.

Tags: , , , , ,

4 Responses to “How to Keep the Enterprise Data Warehouse Relevant”

  1. Chris Detzel October 8, 2010 at 8:36 am #

    From an EA perspective, this article is very relevant. In my conversations with executives over Enterprise Architecture, EDW, MDM and of course Data Governace is key to the overall startegy to the enterprise. Thanks for the writeup!

    Chris

    Twitter: @cdetzel

    • Winston Chen
      Winston Chen October 12, 2010 at 8:25 am #

      Chris, thanks for your comment!

      EA’s influence has been growing and that’s a good thing. Architecture is getting more complex and companies need smart and knowledgeable people to sort things out.

  2. Jim Harris October 8, 2010 at 3:37 pm #

    Great post, Winston.

    Clients often ask me about how enterprise data warehousing (EDW), which they are at least familiar with conceptually, relates to master data management (MDM) and data governance (DG), which they are hearing more and more about these days.

    Many EDW enthusiasts wrongly dismiss MDM by saying “that is what our conformed dimensions already do” and many organizations wrongly dismiss DG by saying “that is what our enterprise-wide best practices already do.”

    I agree with your three promises that the EDW was supposed to deliver–and in many organizations, one or two of these are currently being done fairly well. I also agree that MDM and DG can help the organization fully realize all three promises of the EDW.

    MDM and DG are not going to replace the EDW–they are going to make the EDW (as well as many other things) even better.

    Best Regards,

    Jim

    • Winston Chen
      Winston Chen October 12, 2010 at 8:52 am #

      Jim, thanks for your thoughts.

      To expand on our thoughts around conformed dimensions. They are very hard to get right, because a conformed dimension has to take into account all the different perspectives on product, customer, etc.. It’s very hard to describe them semantically, let alone instantiated in a single dimension table. So most star-schema based data warehouses have several product dimensions, several customer dimensions. Which one you use depends on what question you want to answer. This is what I mean by conformed dimensions being fabled.

      Kalido’s Information Engine product, which manages data warehouses, takes on this problem by providing rich modeling capabilities for dimensional data so that you can express the conformed dimensions semantically so they work for multiple business functions. With some secret sauce, it loads and automatically maintains the dimension tables so that they’re truly conformed and usable for any query.

      If you layer on MDM to govern the master data instances, and Data Governance to govern the model, rules, and policies, we’ll have finally found the fabled conformed dimensions.

      So yes, MDM and data governance can make EDWs better.

Leave a Reply