ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Clustering Techniques for Minimizing External Path Length.

Ajit A. Diwan, Sanjeeva Rane, S. Seshadri, S. Sudarshan: Clustering Techniques for Minimizing External Path Length. VLDB 1996: 342-353
@inproceedings{DBLP:conf/vldb/DiwanRSS96,
  author    = {Ajit A. Diwan and
               Sanjeeva Rane and
               S. Seshadri and
               S. Sudarshan},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Clustering Techniques for Minimizing External Path Length},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {342-353},
  ee        = {db/conf/vldb/DiwanRSS96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

There are a variety of main-memory access structures, such as segment trees, and quad trees, whose properties, such as good worst-case behaviour, make them attractive for database applications. Unfortunately, the structures are typically `long and skinny', whereas disk data structures must be `short-and-fat' (that is, have a high fanout and low height) in order to minimize I/O.

We consider how to cluster the nodes (that is, map the nodes to disk pages) of mainmemory access structures such that that although a path may traverse many nodes, it only traverses a few disk pages. The number of disk pages traversed in a path is called the external path length. We address several version of the clustering problem. We present a clustering algorithm for tree structures that generates optimal worst-case external path length mappings; we also show how to make it dynamic, to support updates. We extend the algorithm to generate mappings that minimize the average weighted external path lengths. We also show that some other clustering problems, such as finding optimal external path lengths for DAG structures and minimizing weights for optimal height mappings, are NP-complete. We present efficient heurisitcs for these problem.

We present a performance study (using quad-trees on actual image data as an example) which shows that our algorithms perform well. Our algorithms can also be used for clustering complex objects in object-oriented databases.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX

Electronic Edition

References

[BKKG88]
Jay Banerjee, Won Kim, Sung-Jo Kim, Jorge F. Garza: Clustering a DAG for CAD Databases. IEEE Trans. Software Eng. 14(11): 1684-1699(1988) BibTeX
[BKSS90]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 BibTeX
[Ch91]
Jia-bing R. Cheng, Ali R. Hurson: Effective Clustering of Complex Objects in Object-Oriented Databases. SIGMOD Conference 1991: 22-31 BibTeX
[CK89]
Ellis E. Chang, Randy H. Katz: Exploiting Inheritance and Structure Semantics for Effective Clustering and Buffering in an Object-Oriented DBMS. SIGMOD Conference 1989: 348-357 BibTeX
[DSST86]
James R. Driscoll, Neil Sarnak, Daniel Dominic Sleator, Robert Endre Tarjan: Making Data Structures Persistent. STOC 1986: 109-121 BibTeX
[Gut84]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 BibTeX
[Jag90]
H. V. Jagadish: Linear Clustering of Objects with Multiple Atributes. SIGMOD Conference 1990: 332-342 BibTeX
[KTF95]
Anil Kumar, Vassilis J. Tsotras, Christos Faloutsos: Access Methods for Bi-Temporal Databases. Temporal Databases 1995: 235-254 BibTeX
[LS92]
Qing Li, John L. Smith: A Conceptual Model for Dynamic Clustering in Object Databases. VLDB 1992: 457-468 BibTeX
[NGV93]
Mark H. Nodine, Michael T. Goodrich, Jeffrey Scott Vitter: Blocking for External Graph Searching. PODS 1993: 222-232 BibTeX
[Ore89]
Jack A. Orenstein: Redundancy in Spatial Databases. SIGMOD Conference 1989: 295-305 BibTeX
[Ore90]
Jack A. Orenstein: A Comparison of Spatial Query Processing Techniques for Native and Parameter Spaces. SIGMOD Conference 1990: 343-352 BibTeX
[RS94]
Sridhar Ramaswamy, Sairam Subramanian: Path Caching: A Technique for Optimal External Searching. PODS 1994: 25-35 BibTeX
[Sam95]
Hanan Samet: Spatial Data Structures. Modern Database Systems 1995: 361-385 BibTeX
[SB93]
Clifford A. Shaffer, Patrick R. Brown: A Paging Scheme for Pointer-Based Quadtrees. SSD 1993: 89-104 BibTeX
[Sch77]
Mario Schkolnick: A Clustering Algorithm for Hierarchical Structures. ACM Trans. Database Syst. 2(1): 27-44(1977) BibTeX
[SRF87]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518 BibTeX
[SSN90]
Clifford A. Shaffer, Hanan Samet, Randal C. Nelson: QUILT: A Geographic Information System based on Quadtrees. IJGIS 4(2): 103-131(1990) BibTeX
[TN91]
Manolis M. Tsangaris, Jeffrey F. Naughton: A Stochastic Approach for Clustering in Object Bases. SIGMOD Conference 1991: 12-21 BibTeX
[TN92]
Manolis M. Tsangaris, Jeffrey F. Naughton: On the Performance of Object Clustering Techniques. SIGMOD Conference 1992: 144-153 BibTeX

Referenced by

  1. Kothuri Venkata Ravi Kanth, Ambuj K. Singh: Optimal Dynamic Range Searching in Non-replicating Index Structures. ICDT 1999: 257-276
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:11 2009