Cost-Based Optimization of
Decision Support Queries using
Transient-Views
Subbu N. Subramanian (IBM Santa Teresa Labs)
Shivakumar Venkataraman (IBM Santa Teresa Labs)
Next generation decision support applications, besides being capable
of processing huge amounts of data, require the ability to integrate
and reason over data from multiple, heterogeneous data sources. Often,
these data sources differ in a variety of aspects such as their data
models, the query languages they support, and their network protocols.
Also, typically they are spread over a wide geographical area. The
cost of processing decision support queries in such a setting is quite
high. However, processing these queries often involves redundancies
such as repeated access of same data source and multiple execution of
similar processing sequences. Minimizing these redundancies would
significantly reduce the query processing cost. In this paper, we (1)
propose an architecture for processing complex decision support
queries involving multiple, heterogeneous data sources; (2) introduce
the notion of {\em transient-views} -- materialized views that exist
only in the context of execution of a query -- that is useful for
minimizing the redundancies involved in the execution of these
queries; (3) develop a {\em cost-based} algorithm that takes a query
plan as input and generates an optimal ``covering plan", by minimizing
redundancies in the original plan; (4) validate our approach by means
of an implementation of the algorithms and a detailed performance
study based on TPC-D benchmark queries on a commercial database
system; and finally, (5) compare and contrast our approach with work
in related areas, in particular, the areas of answering queries using
views and optimization using common sub-expressions. Our experiments
demonstrate the practicality and usefulness of transient-views in
significantly improving the performance of decision support queries.