Digital Symposium Collection 2000  

 
 
 
 
 
 

 















Tape-Disk Join Strategies under Disk Contention

A. Kraiss, P. Muth,, and M. Gillmann

  View Paper (PDF)  

Return to Session 16: Tertiary Storage

Abstract

Large-scale data warehousing, data mining, and scientific applications require the analysis of terabytes of facts data accumulated over long periods of time. Tape libraries are suitable devices for storing such mass data. The online analytical processing (OLAP) of this data typically leads to long-running aggregation queries joining the tape-resident facts relation with disk-resident dimension relations. Typically, during the execution of the join, the disks storing the dimension relations are not dedicated to the join. They are subject of reads and writes invoked by concurrently running applications. In many cases, it is desirable that the performance of these concurrent applications must not be degraded too much by the processing of the join. In this paper, we present an accurate model for analyzing the performance of three different tape-disk join strategies in multi-query systems like database or OLAP servers. The major contributions of this paper are (a) a detailed cost model considering tape and disk bandwidth, tape and disk latencies, available buffer sizes, CPU costs, and the selectivity of filters on tape data, (b) the consideration of disk queueing effects due to concurrent reads and writes at the disk, and (c) the consideration of two different disk scheduling strategies. Based on the analytical model, we show the superiority of a disk scheduling strategy giving preference to the service of the concurrent disk load. Furthermore, we present a strategy for dynamically selecting the most beneficial join algorithm and its parameters at runtime. We have implemented the tape-disk join strategies in a prototype system based on detailed simulations of secondary and tertiary storage devices. Our experimental evaluations confirm that the analytical model is indeed very accurate and a suitable basis for run-time strategy decisions.

























Copyright(C) 2000 ACM