Tuesday, June 2, 2009

Transaction Applications Vs Data Warehouse Application

There is an everlasting question about the fundamental differences between Transactional applications and data warehouse applications. The definition of a data warehouse according to Inmon is "a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process". Now raises the question "If a transaction based application never deleted any data and stored history and is the only application used by the company can it be treated as a data warehouse?". It depends. 


I recently came across an article that outlines the differences between the two. Though many understand the differences, I felt the article gave a textbook answer to this question which I think every data warehouse professional should know.

Project Domain

Transaction/Production Business System

Data Warehouse System

Business Benefits

Tangible benefits in terms of functional capabilities, business processes that will be automated, number of headcounts and reduction etc. Typically, the process getting automated is being done manually, and there is enough visible pain at the ground level and customers.

There is a lesser proportion of initiatives where there is 'heaven will fall', if the project is not done.

The benefits can be appreciated by fewer people and much fewer at a ground level.

Usage

A business system once implemented, drives the usage as it typically automates a business process.

Data Warehouse platform has a lesser compulsion for usage. Unless there are critical operational reports required.

Measure of Usage

One can specify the measure of usage for a business system in terms of processed unit, number of users.

While number of users and the number of queries does represent the level of usage, but it no means suggest that the usage is resulting in delivery of final outcome.

Skills and Expertise Requirements- Business

Business system requires the expertise on business process knowledge

More knowledge is required horizontally and vertically. One needs a much higher domain experience as well as cross-functional knowledge for an effective business role fulfillment in Data Warehouse project. The domain expertise also includes all three levels (strategic, managerial and operational).

Business Requirements

Ability of defining the business requirements, prioritization is easier as a business system automates an existing process and/OR a severely needed business functionality.

What analysis one needs, why and what one will do post its availability are questions, which demand/challenge the management and strategic thought process. Unlike a business process, analysis for any problem can be done in hundred different ways. Therefore, business requirements tend to change throughout a Data Warehouse project.

Business users availability and engagement

Business users are more available and engaged.

Its easier to provide and confirm the requirements of a business process automation, and difficult to define the information and analysis needs. Business users are too busy doing day-to-day work to dwell upon these questions.

The demands on the Database

The queries and data access is predictable as they are driven by the mapping of type of transaction, instances etc. A typical transaction touches only certain tables and certain records. Mostly the large and all-encompassing processing happens end of the day processing.

Data-Warehouse cannot predict the kind and incidences of queries on the system. A query can access all the tables and records.

Variety of front-end applications

A business system has a pre-defined back-end and front end applications accessing the back-end Database

A data-Warehouse could be having new front-end applications being added on the ongoing basis. This includes OLAP tools, Data mining applications, business performance management applications, online user query and reporting applications.

Expectations of flexibility to enhancements

A typical business system has an ever increasing list of enhancements, However, it is expected that the enhancements will take time and system will go through well-spaced out releases.

A Data-Warehouse is expected to provide granular enhancements for most cases. It has to have its design flexible enough to be able to incorporate new dimensions, measures and system sources without unsettling the foundations.


Sunday, May 24, 2009

Is Business Intelligence used efficiently?

Almost all companies today have a data warehouse and some sort of business intelligence platform. But the fact remains that many don't know how to make the best out of their data warehouse and business Intelligence platforms. The problem is many fold.

IT teams inability to understand the Business needs: Many IT personnel supporting a data warehouse are too technology centric. They fail to understand the business needs completely. Once a data warehouse is in place, the supporting members have little or no knowledge of business requirements, only technical aspect of it. This will not enable them to provide the support the business needs. Most often business is frustrated with this gap in understanding and complain about the inefficiency of IT personnel.

Business teams inability to explain what they want: The business teams many times don't really explain what they want, clearly. More often the questions are too generic like, "I need revenue information every month". Once some revenue number is provided by IT, they ask for more detailed report. At this point, the data warehouse team that does not have a good foresight would fail to provide the information the business asks. The business needs to be very particular about what they want at the most granular level and leave no ambiguity for the data warehouse team.

The Gap between IT and Business: There is a huge gap between business and IT in many organizations. The successful companies have bridged the gap. Coincidentally, I was reading an article on TDWI "Bridging the IT/Business Culture Chasm" by Hugh J. Watson in which he mentions, the companies that successfully utilized Business Intelligence are the ones that had Business Analysts with technical abilities. He also mentions that the companies with successful decision support systems had data warehouse personnel, who at some point in their career worked on the business side and had the ability to bridge the gap.

To conclude, a company can have a successful utilization of the business intelligence only when there is an overlap in the Business and IT teams. I believe that many companies have recognized this and seeking to bridge the gap for better use of their data.

Wednesday, May 20, 2009

ETL tools in the world of data warehousing

ETL stands for Extract Transform Load. In a typical data warehouse we Extract the data from the source systems, Transform the data by applying any business rules, and Load the data into the target, hence the word ETL. 

There are many ETL tools in the market, resulting in confusion of which tool to use. Some people who are hardcore programmers believe that there is no need of an ETL tool in the development process of a data warehouse, as any programming language can be used to write ETL code. Though this may be true, there are many downsides to developing a data warehouse without an ETL tool. Many oracle shops tend to use PL/SQL for developing the code. This is feasible. However, over a period of time TCO (Total Cost of Ownership) of code based data warehouse is going to be very high in comparision to an ETL tool. Also, making changes and maintaining a code based data warehouse is going to get very difficult. Hence it is a very prudent approach to identify a ETL tool that fits the budget and then proceed. Any data warehouse developed using code based ETL is a short sighted approach, and is a disaster waiting to happen... for the people maintaining it.

If you are a SQL Server shop, then its a no-brainer to use SSIS (Sql Server Integration Services) which is very close to Informatica in terms of the approach. For other database platform users there are ETL tools like Informatica, Data Stage, ODI (formerly sunopsis), Abnitio e.t.c

Bottom line...If you are designing a data warehouse or planning to, make no mistake... consider an ETL tool. 

Consider yourself warned...

Tuesday, May 12, 2009

Bill Inmon Vs. Ralph Kimball

There is a continuing debate among Data Warehouse professionals as to which data warehousing approach to take; Inmon or Kimball. Bill Inmon is considered as the father of Data Warehousing and Kimball according to me is the genius in Data Warehousing. I might already sound biased but that's what my experience says.

When Bill Inmon was designing or was coming up with the concept of data warehousing, disk space (the single most important component of data warehousing after the data itself) was quite expensive. So any methodology that he has comeup with has to take disk space into consideration giving more importance to it than to data itself. On the other hand, Kimballs approach ignores (to an extent) the issue of disk space, 'coz the disk space got cheaper, and is greared fully towards data and its usage. This is the reason we see more data warehouse designers flocking to kimball's methodology instead of Inmons. Also, Inmons methodology is more expensive to develop and implement than Kimball methodology.

For those of you who don't know, Inmons methodology means creation of a normalized data warehouse and Kimball's methodology is creation of Dimensional data warehouse. When designing a data warehouse, the Architect has to decide the methodology. I feel that there is still a lot of confusion among the data warehousing community on which approach to take. I personally would always go for creation of a Dimensional Model. The simple reason being that its easy to develop, easy to modularize, and mostly easy to change direction of the data warehouse per business needs. Normalized method, concentrates more on the data integrity than data usability. Dimensional model, concentrates on the usability while providing data integrity. For every Normalized data warehouse there must exist a dimensional data mart to support reporting or a OLAP or MOLAP tool. So, why not create the base model as a dimensional model?

Ok, enough of praising the dimensional modeling. There are data warehouses that failed as a result of designing them as dimensional models. One has to be careful then designing the dimensional model, as there is more room to make mistakes, and more often than not, mistakes do happen. One of the most common mistakes is the use of Surrogate Keys or rather not using Surrogate Keys. There are some designers that believed that surrogate keys are not required because their source system generate a unique key always. Big mistake. This is the first step towards the failure of the dimensional model. Another issue that I have seen is when designing the bridge tables, designers are tempted to use natural keys instead of surrogate keys. This will result in eventual failure of the model over a period of time, and also gives very bad query performance.

One has to be very careful when designing a dimensional model because there are more chances of making mistakes with the dimensional model than with a normalized model. 

Bottom line, for a dimensional model to be successful, a designer has to have a detailed understanding of the underlying subject area. Else, the project is a disaster waiting to happen...

some of my favorite books on data warehousing

  1. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition) (Paperback)
  2. The Data Warehouse Lifecycle Toolkit (Paperback)
  3. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning
  4. Data Modeler's Workbench: Tools and Techniques for Analysis and Design
  5. Data Modeling Made Simple: A Practical Guide for Business & Information Technology Professionals 

What is data warehousing?

Actions of the past have proven to be guidelines for the future...

Data Warehousing is a process by which any organization, or individual (yes individual) will have the ability to look at the "actions" performed in the past, present and have the ability to predict the near future. In my statement above when I say "action" it means different things. Action for a Financial Brokerage can be a trade, for a retail store can be a sale of merchandise, and for an individual it could mean dinning out or shopping for clothes.

For simplicity and understanding, I will write more from the perspective of an individual expenses and income.

As an individual one is filled with tens of actions on a daily basis. If we make a note of all those actions we are effectively building a data warehouse. I personally have a data warehouse of my own, that tells me where every dollar goes. 

One who Keeps track of ones income and expenses over a period of time is effectively creating a data warehouse for oneself.

This is what all the institutions that build data warehouses do. Keep track of every action via myriad of systems in a centralized location to perform analysis.

Thursday, August 9, 2007

Blogging

Hello readers,



I'm writing this blog for myself. So, if you happen to stumble upon this blog and don't understand it, too bad, you are not supposed to. i don't mean to be rude, but this is just a dumping ground for all things in my head. Sometimes it is crap, but I'm hoping to get something good out of it.



have a nice day!