|
Data Warehousing
Information Management is a big ask, and data warehousing
makes it convenient
Data warehousing was proclaimed as
the solution to the management information dilemma. However,
the term "data warehouse" has become one of the
most used and abused terms in the IT vocabulary. Ask a variety
of vendors and professionals for their vision of what a data
warehouse is and how it should be built. The ambiguity of
the term will quickly become apparent.
The concept of "data warehousing"
dates back at least to the mid-1980s, and possibly earlier.
In essence, it was intended to provide an architectural model
for the flow of data from operational systems to decision
support environments. It attempted to address the various
problems associated with this flow, and the high costs associated
with it. In the absence of such architecture, there usually
existed an enormous amount of redundancy in the delivery of
management information.
A number of people imagine a data warehouse
to be any collection of summarised data from various sources,
structured and optimised for query access using OLAP (on-line
analytical processing) query tools. The vendors of OLAP tools
originally propagated this view. To others, a data warehouse
is virtually any database containing data from more than one
source, collected for the purpose of providing management
information. This definition is not helpful since such databases
have been a feature of decision support solutions much before
the coining of the term "data warehouse."
In larger corporations it was typical
for multiple decision support projects to operate independently,
each serving different users but often requiring much of the
same data. The process of gathering, cleaning and integrating
data from various sources, often legacy systems, was typically
replicated for each project. Moreover, legacy systems were
frequently being revisited as new requirements emerged. Each
of these required a subtly different view of the legacy data.
Based on analogies with real-life warehouses,
data warehouses were intended as large-scale collection/storage/staging
areas for legacy data. From here data could be distributed
to "retail stores" or "data marts" which
were tailored for access by decision support users. The data
warehouse was designed to manage the bulk supply of data from
its suppliers. To handle the organization and storage of this
data, the "retail stores" or "data marts"
could be used. These would focus on packaging and presenting
selections of the data to end-users, often to meet specialised
needs.
Somewhere along the way this analogy
and architectural vision was lost, often manipulated by suppliers
of decision support software tools. Data warehousing "gurus"
began to emerge at the end of the 80s, often themselves associated
with such companies. The architectural vision was frequently
replaced by studies of how to design decision support databases.
Suddenly the data warehouse had become the miracle cure for
the decision support headache, and suppliers jostled for position
in the increasing data warehousing marketplace.
|