SIGMOD'99 - Paper

An Adaptive Query Execution Engine for Data Integration

Zachary Ives^*	Daniela Florescu	Marc Friedman
University of Washington	INRIA	University of Washington
zives@cs.washington.edu	Daniela.Florescu@inria.fr	friedman@cs.washington.edu

Alon Levy		Daniel S. Weld
University of Washington		University of Washington
alon@cs.washington.edu		weld@cs.washington.edu

Abstract
Traditional database query execution techniques are inadequate for data integration purposes for three reasons: the absence of quality statistics on the data (which resides in remote sources), the unpredictable and bursty data arrival rates from these sources, and the frequent overlap and redundancy of the data sources. This paper presents the fully-implemented Tukwila data integration system and describes three novel, adaptive-execution features that combine to yield improved performance. (1) The adaptive double-pipelined hash join reduces latency and transforms incrementally into a hybrid-hash join in the face of insufficient memory. (2) The dynamic collector operator computes robust and efficient unions over possibly overlapping data sources. (3) Interleaved planning and execution with partial optimization allows Tukwila to recover quickly from poor optimizer decisions based on inaccurate estimates. We demonstrate that the Tukwila architecture extends previous innovations in adaptive execution (such as query scrambling, mid-execution reoptimization, and choose nodes), and we present experimental evidence that our techniques result in significant speedup.