COST-BASED QUERY SCRAMBLING FOR INITIAL DELAYS
Tolga Urhan
(University of Maryland)
Michael J. Franklin
(University of Maryland)
Laurent Amsaleg
(IRISA)
Remote data access from disparate sources across a wide-area network
such as the Internet is problematic due to the unpredictable nature of
the communications medium and the lack of knowledge about the load and
potential delays at remote sites. Traditional, static, query
processing approaches break down in this environment because they are
unable to adapt in response to unexpected delays. Query scrambling has
been proposed to address this problem. Scrambling modifies query
execution plans on-the-fly when delays are encountered during runtime.
In its original formulation, scrambling was based on simple
heuristics, which although providing good performance in many cases,
were also shown to be susceptible to problems resulting from bad
scrambling decisions. In this paper we address these shortcomings by
investigating ways to exploit query optimization technology to aid in
making intelligent scrambling choices. We propose three different
approaches to using query optimization for scrambling. These
approaches vary, for example, in whether they optimize for total work
or response-time, and whether they construct partial or complete
alternative plans. Using a two-phase randomized query optimizer, a
distributed query processing simulator, and a workload derived from
queries of the TPC-D benchmark, we evaluate these different approaches
and compare their ability to cope with initial delays in accessing
remote sources. The results show that cost-based scrambling can
effectively hide initial delays, but that in the absence of good
predictions of expected delay durations, there are fundamental
tradeoffs between risk aversion and effectiveness.