ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Integrated Document Caching and Prefetching in Storage Hierarchies Based on Markov-Chain Predictions.

Achim Kraiss, Gerhard Weikum: Integrated Document Caching and Prefetching in Storage Hierarchies Based on Markov-Chain Predictions. VLDB J. 7(3): 141-162(1998)
@article{DBLP:journals/vldb/KraissW98,
  author    = {Achim Kraiss and
               Gerhard Weikum},
  title     = {Integrated Document Caching and Prefetching in Storage Hierarchies
               Based on Markov-Chain Predictions},
  journal   = {VLDB J.},
  volume    = {7},
  number    = {3},
  year      = {1998},
  pages     = {141-162},
  ee        = {db/journals/vldb/KraissW98.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Large multimedia document archives may hold a major fraction of their data in tertiary storage libraries for cost reasons. This paper develops an integrated approach to the vertical data migration between the tertiary, secondary, and primary storage in that it reconciles speculative prefetching, to mask the high latency of the tertiary storage, with the replacement policy of the document caches at the secondary and primary storage level, and also considers the interaction of these policies with the tertiary and secondary storage request scheduling.

The integrated migration policy is based on a continuous-time Markov chain model for predicting the expected number of accesses to a document within a specified time horizon. Prefetching is initiated only if that expectation is higher than those of the documents that need to be dropped from secondary storage to free up the necessary space. In addition, the possible resource contention at the tertiary and secondary storage is taken into account by dynamically assessing the response-time benefit of prefetching a document versus the penalty that it would incur on the response time of the pending document requests.

The parameters of the continuous-time Markov chain model, the probabilities of co-accessing certain documents and the interaction times between successive accesses, are dynamically estimated and adjusted to evolving workload patterns by keeping online statistics. The integrated policy for vertical data migration has been implemented in a prototype system. The system makes profitable use of the Markov chain model also for the scheduling of volume exchanges in the tertiary storage library. Detailed simulation experiments with Web-server-like synthetic workloads indicate significant gains in terms of client response time. The experiments also show that the overhead of the statistical bookkeeping and the computations for the access predictions is affordable.

Key Words

Performance - Caching - Prefetching - Scheduling - Tertiary storage - Stochastic modeling - Markov chains

Copyright © 1998 by Springer, Berlin, Heidelberg. Permission to make digital or hard copies of the abstract is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice along with the full citation.


Online Edition (Springer)

Citation Page

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 5 Issue 2, JACM, VLDB-J, POS, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

References

[AAFZ95]
Swarup Acharya, Rafael Alonso, Michael J. Franklin, Stanley B. Zdonik: Broadcast Disks: Data Management for Asymmetric Communications Environments. SIGMOD Conference 1995: 199-210 BibTeX
[ABCO96]
Virgilio Almeida, Azer Bestavros, Mark Crovella, Adriana de Oliveira: Characterizing Reference Locality in the WWW. PDIS 1996: 92-103 BibTeX
[AFZ96]
Swarup Acharya, Michael J. Franklin, Stanley B. Zdonik: Prefetching from Broadcast Disks. ICDE 1996: 276-285 BibTeX
[AGL97]
Susanne Albers, Naveen Garg, Stefano Leonardi: Minimizing Stall Time in Single and Parallel Disk Systems. STOC 1998: 454-462 BibTeX
[All90]
...
[Be96]
Azer Bestavros: Speculative Data Dissemination and Service to Reduce Server Load, Network Traffic and Service Time in Distributed Information Systems. ICDE 1996: 180-187 BibTeX
[CFKL95a]
Pei Cao, Edward W. Felten, Anna R. Karlin, Kai Li: A Study of Integrated Prefetching and Caching Strategies. SIGMETRICS 1995: 188-197 BibTeX
[CFKL95b]
Pei Cao, Edward W. Felten, Kai Li: Implementation and Performance of Application-Controlled File Caching. OSDI 1994: 165-177 BibTeX
[CH91]
Jia-bing R. Cheng, Ali R. Hurson: On The Performance of Object-Based Buffering. PDIS 1991: 30-37 BibTeX
[CK89]
Ellis E. Chang, Randy H. Katz: Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS. SIGMOD Conference 1989: 348-357 BibTeX
[CKV93]
Kenneth M. Curewitz, P. Krishnan, Jeffrey Scott Vitter: Practical Prefetching via Data Compression. SIGMOD Conference 1993: 257-266 BibTeX
[CR94]
Ling Tony Chen, Doron Rotem: Optimizing Storage of Objects on Mass Storage Systems with Robotic Devies. EDBT 1994: 273-286 BibTeX
[CTZ97]
Stavros Christodoulakis, Peter Triantafillou, Fenia Zioga: Principles of Optimally Placing Data in Tertiary Storage Libraries. VLDB 1997: 236-245 BibTeX
[Co88]
George P. Copeland, William Alexander, Ellen E. Boughter, Tom W. Keller: Data Placement In Bubba. SIGMOD Conference 1988: 99-108 BibTeX
[CSIM]
...
[DevSim]
...
[DWAP94]
Michael Dahlin, Randolph Y. Wang, Thomas E. Anderson, David A. Patterson: Cooperative Caching: Using Remote Client Memory to Improve File System Performance. OSDI 1994: 267-280 BibTeX
[FC91]
Daniel Alexander Ford, Stavros Christodoulakis: Optimal Placement of High-Probability Randomly Retrieved Blocks on CLV Optical Discs. ACM Trans. Inf. Syst. 9(1): 1-30(1991) BibTeX
[GK94]
Carsten Andreas Gerlhof, Alfons Kemper: Prefetch Support Relations in Object Bases. POS 1994: 115-126 BibTeX
[GKKM93]
Carsten Andreas Gerlhof, Alfons Kemper, Christoph Kilger, Guido Moerkotte: Partition-Based Clustering in Object Bases: From Theory to Practice. FODO 1993: 301-316 BibTeX
[GMW94]
Leana Golubchik, Richard R. Muntz, Richard W. Watson: Analysis of Striping Techniques in Robotic Storage Libraries. IEEE Symposium on Mass Storage Systems 1995: 225-238 BibTeX
[GP87]
Jim Gray, Gianfranco R. Putzolu: The 5 Minute Rule for Trading Memory for Disk Accesses and The 10 Byte Rule for Trading Memory for CPU Time. SIGMOD Conference 1987: 395-398 BibTeX
[HS96]
Bruce Hillyer, Abraham Silberschatz: Random I/O Scheduling in Online Tertiary Storage Systems. SIGMOD Conference 1996: 195-204 BibTeX
[JK98]
...
[Jo98]
Theodore Johnson: Coarse Indices for a Tape-Based Data Warehouse. ICDE 1998: 231-240 BibTeX
[KP97]
Geoffrey H. Kuenning, Gerald J. Popek: Automated Hoarding for Mobile Computers. SOSP 1997: 264-275 BibTeX
[KPR92]
Anna R. Karlin, Steven J. Phillips, Prabhakar Raghavan: Markov Paging (Extended Abstract). FOCS 1992: 208-217 BibTeX
[KW97]
Achim Kraiss, Gerhard Weikum: Vertical Data Migration in Large Near-Line Document Archives Based on Markov-Chain Predictions. VLDB 1997: 246-255 BibTeX
[LLW95]
Siu-Wah Lau, John C. S. Lui, P. C. Wong: A Cost-effective Near-line Storage Server for Multimedia System. ICDE 1995: 449-456 BibTeX
[MKK95]
Frank Moser, Achim Kraiss, Wolfgang Klas: L/MRP: A Buffer Management Strategy for Interactive Continuous Data Flows in a Multimedia DBMS. VLDB 1995: 275-286 BibTeX
[ML97]
Jussi Myllymaki, Miron Livny: Relational Joins for Data on Tertiary Storage. ICDE 1997: 159-168 BibTeX
[Nel95]
...
[NKT97]
Toshihiro Nemoto, Masaru Kitsuregawa, Mikio Takagi: Analysis of Cassette Migration Activities in Scalable Tape Archiver. DASFAA 1997: 461-470 BibTeX
[OOW93]
Elizabeth J. O'Neil, Patrick E. O'Neil, Gerhard Weikum: The LRU-K Page Replacement Algorithm For Database Disk Buffering. SIGMOD Conference 1993: 297-306 BibTeX
[PGG+95]
R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, Jim Zelenka: Informed Prefetching and Caching. SOSP 1995: 79-95 BibTeX
[PZ91]
Mark Palmer, Stanley B. Zdonik: Fido: A Cache That Learns to Fetch. VLDB 1991: 255-264 BibTeX
[RW94]
Chris Ruemmler, John Wilkes: An Introduction to Disk Drive Modeling. IEEE Computer 27(3): 17-28(1994) BibTeX
[Sa95]
Sunita Sarawagi: Query Processing in Tertiary Memory Databases. VLDB 1995: 585-596 BibTeX
[Smi81]
Alan Jay Smith: Long Term File Migration: Development and Evaluation of Algorithms. Commun. ACM 24(8): 521-532(1981) BibTeX
[SSV96]
Peter Scheuermann, Junho Shim, Radek Vingralek: WATCHMAN : A Data Warehouse Intelligent Cache Manager. VLDB 1996: 51-62 BibTeX
[Sto91]
Michael Stonebraker: Managing Persistent Objects in a Multi-Level Store. SIGMOD Conference 1991: 2-11 BibTeX
[SW97]
Markus Sinnwell, Gerhard Weikum: A Cost-Model-Based Online Method for Ditributed Caching. ICDE 1997: 532-541 BibTeX
[SWZ94]
Peter Scheuermann, Gerhard Weikum, Peter Zabback: ``Disk Cooling'' in Parallel Disk Systems. IEEE Data Eng. Bull. 17(3): 29-40(1994) BibTeX
[SWZ98]
Peter Scheuermann, Gerhard Weikum, Peter Zabback: Data Partitioning and Load Balancing in Parallel Disk Systems. VLDB J. 7(1): 48-66(1998) BibTeX
[TCG96]
...
[TG84]
James Z. Teng, Robert A. Gumaer: Managing IBM Database 2 Buffers to Maximize Performance. IBM Systems Journal 23(2): 211-218(1984) BibTeX
[Tij94]
...
[TN91]
Manolis M. Tsangaris, Jeffrey F. Naughton: A Stochastic Approach for Clustering in Object Bases. SIGMOD Conference 1991: 12-21 BibTeX
[TN92]
Manolis M. Tsangaris, Jeffrey F. Naughton: On the Performance of Object Clustering Techniques. SIGMOD Conference 1992: 144-153 BibTeX
[TP97]
Peter Triantafillou, Thomas Papadakis: On-Demand Data Elevation in Hierarchical Multimedia Storage Servers. VLDB 1997: 226-235 BibTeX
[VLN97]
Shivakumar Venkataraman, Miron Livny, Jeffrey F. Naughton: Memory Management for Scalable Web Data Servers. ICDE 1997: 510-519 BibTeX
[WHMZ94]
Gerhard Weikum, Christof Hasse, Alex Moenkeberg, Peter Zabback: The COMFORT Automatic Tuning Project, Invited Project Review. Inf. Syst. 19(5): 381-432(1994) BibTeX
[Wo83]
C. K. Wong: Algorithmic Studies in Mass Storage Systems. Computer Science Press 1983
BibTeX
[WZ86]
Hartmut Wedekind, Georg Zörntlein: Prefetching in Realtime Database Applications. SIGMOD Conference 1986: 215-226 BibTeX

Referenced by

  1. Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava: Evolution and Revolutions in LDAP Directory Caches. EDBT 2000: 202-216
  2. Gerhard Weikum, Arnd Christian König, Achim Kraiss, Markus Sinnwell: Towards Self-Tuning Memory Management for Data Servers. IEEE Data Eng. Bull. 22(2): 3-11(1999)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Journal: 1992-1995 Copyright © by VLDB Endowment / 1996-... Copyright © by Springer Verlag,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sun May 17 00:31:33 2009