ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space.

Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. PODS 1997: 78-86
@inproceedings{DBLP:conf/pods/BerchtoldBKK97,
  author    = {Stefan Berchtold and
               Christian B{\"o}hm and
               Daniel A. Keim and
               Hans-Peter Kriegel},
  title     = {A Cost Model For Nearest Neighbor Search in High-Dimensional
               Data Space},
  booktitle = {Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, May 12-14, 1997, Tucson, Arizona},
  publisher = {ACM Press},
  year      = {1997},
  isbn      = {0-89791-910-6},
  pages     = {78-86},
  ee        = {http://doi.acm.org/10.1145/263661.263671, db/conf/pods/BerchtoldBKK97.html},
  crossref  = {DBLP:conf/pods/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

In this paper, we present a new cost model for nearest neighbor search in high-dimensional data space. We first analyze different nearest neighbor algorithms, present a generalization of an algorithm which has been originally proposed for Quadtrees [13], and show that this algorithm is optimal. Then, we develop a cost model which - in contrast to previous models - takes boundary effects into account and therefore also works in high dimensions. The advantages of our model are in particular: Our model works for data sets with an arbitrary number of dimensions and an arbitrary number of data points, is applicable to different data distributions and index structures, and provides accurate estimates of the expected query execution time. To show the practical relevance and accuracy of our model, we perform a detailed analysis using synthetic and real data. The results of applying our model to Hilbert and X-tree indices show that it provides a good estimation of the query performance, which is considerably better than the estimates by previous models especially for high-dimensional data.

Copyright © 1997 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ... BibTeX

Printed Edition

Proceedings of the Sixteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 12-14, 1997, Tucson, Arizona. ACM Press 1997, ISBN 0-89791-910-6
Contents BibTeX

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1177 KB]

References

[1]
...
[2]
...
[3]
...
[4]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 BibTeX
[5]
Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39 BibTeX
[6]
...
[7]
...
[8]
Caroline M. Eastman: Optimal Bucket Size for Nearest Neighbor Searching in k-d Trees. Inf. Process. Lett. 12(4): 165-167(1981) BibTeX
[9]
Christos Faloutsos, Ron Barber, Myron Flickner, Jim Hafner, Wayne Niblack, Dragutin Petkovic, William Equitz: Efficient and Effective Querying by Image Content. J. Intell. Inf. Syst. 3(3/4): 231-262(1994) BibTeX
[10]
Christos Faloutsos, Volker Gaede: Analysis of n-Dimensional Quadtrees using the Hausdorff Fractal Dimension. VLDB 1996: 40-50 BibTeX
[11]
Christos Faloutsos, Shari Roseman: Fractals for Secondary Key Retrieval. PODS 1989: 247-252 BibTeX
[12]
...
[13]
Gísli R. Hjaltason, Hanan Samet: Ranking in Spatial Databases. SSD 1995: 83-95 BibTeX
[14]
...
[15]
Karen Kukich: Techniques for Automatically Correcting Words in Text. ACM Comput. Surv. 24(4): 377-439(1992) BibTeX
[16]
H. V. Jagadish: A Retrieval Technique for Similar Shapes. SIGMOD Conference 1991: 208-217 BibTeX
[17]
Rajiv Mehrotra, James E. Gary: Feature-Based Retrieval of Similar Shapes. ICDE 1993: 108-115 BibTeX
[18]
Rajiv Mehrotra, James E. Gary: Feature-Index-Based Similar Shape Retrieval. VDB 1995: 46-65 BibTeX
[19]
Apostolos Papadopoulos, Yannis Manolopoulos: Performance of Nearest Neighbor Queries in R-Trees. ICDT 1997: 394-408 BibTeX
[20]
...
[21]
...
[22]
Nick Roussopoulos, Stephen Kelley, Frédéic Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79 BibTeX
[23]
...
[24]
...
[25]
Robert F. Sproull: Refinements to Nearest-Neighbor Searching in k-Dimensional Trees. Algorithmica 6(4): 579-589(1991) BibTeX
[26]
...
[27]
...
[28]
Andrew Chi-Chih Yao, F. Frances Yao: A General Approach to d-Dimensional Geometric Queries (Extended Abstract). STOC 1985: 163-168 BibTeX

Referenced by

  1. Stefan Berchtold, Christian Böhm, Daniel A. Keim, Florian Krebs, Hans-Peter Kriegel: On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces. ICDT 2001: 435-449
  2. Charu C. Aggarwal, Alexander Hinneburg, Daniel A. Keim: On the Surprising Behavior of Distance Metrics in High Dimensional Spaces. ICDT 2001: 420-434
  3. Caetano Traina Jr., Agma J. M. Traina, Bernhard Seeger, Christos Faloutsos: Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes. EDBT 2000: 51-65
  4. Christian Böhm, Hans-Peter Kriegel: Dynamically Optimizing High-Dimensional Index Structures. EDBT 2000: 36-50
  5. Gísli R. Hjaltason, Hanan Samet: Distance Browsing in Spatial Databases. ACM Trans. Database Syst. 24(2): 265-318(1999)
  6. Tolga Bozkaya, Z. Meral Özsoyoglu: Indexing Large Metric Spaces for Similarity Search Queries. ACM Trans. Database Syst. 24(3): 361-404(1999)
  7. Kaushik Chakrabarti, Sharad Mehrotra: Efficient Concurrency Control in Multidimensional Access Methods. SIGMOD Conference 1999: 25-36
  8. Charu C. Aggarwal, Cecilia Magdalena Procopiuc, Joel L. Wolf, Philip S. Yu, Jong Soo Park: Fast Algorithms for Projected Clustering. SIGMOD Conference 1999: 61-72
  9. Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft: When Is ''Nearest Neighbor'' Meaningful? ICDT 1999: 217-235
  10. Hakan Ferhatosmanoglu, Divyakant Agrawal, Amr El Abbadi: Concentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching. ICDE 1999: 608-615
  11. Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447
  12. Philip S. Yu: Data Mining and Personalization Technologies. DASFAA 1999: 6-13
  13. Gunter Saake, Andreas Heuer: Datenbanken: Implementierungstechniken. MITP-Verlag 1999, ISBN 3-8266-0513-6
    Contents
  14. Mihael Ankerst, Hans-Peter Kriegel, Thomas Seidl: A Multistep Approach for Shape Similarity Search in Image Databases. IEEE Trans. Knowl. Data Eng. 10(6): 996-1004(1998)
  15. King Lum Cheung, Ada Wai-Chee Fu: Enhanced Nearest Neighbour Search on the R-tree. SIGMOD Record 27(3): 16-21(1998)
  16. Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB 1998: 194-205
  17. Erik Riedel, Garth A. Gibson, Christos Faloutsos: Active Storage for Large-Scale Data Mining and Multimedia. VLDB 1998: 62-73
  18. Mihael Ankerst, Bernhard Braunmüller, Hans-Peter Kriegel, Thomas Seidl: Improving Adaptable Similarity Query Processing by Using Approximations. VLDB 1998: 206-217
  19. Thomas Seidl, Hans-Peter Kriegel: Optimal Multi-Step k-Nearest Neighbor Search. SIGMOD Conference 1998: 154-165
  20. Apostolos Papadopoulos, Yannis Manolopoulos: Similarity Query Processing Using Disk Arrays. SIGMOD Conference 1998: 225-236
  21. Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel: The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. SIGMOD Conference 1998: 142-153
  22. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD Conference 1998: 94-105
  23. Paolo Ciaccia, Marco Patella, Pavel Zezula: A Cost Model for Similarity Queries in Metric Spaces. PODS 1998: 59-68
  24. Pankaj K. Agarwal, Lars Arge, Jeff Erickson, Paolo Giulio Franciosa, Jeffrey Scott Vitter: Efficient Searching with Linear Constraints. PODS 1998: 169-178
  25. Andreas Henrich: The LSDh-Tree: An Access Structure for Feature Vectors. ICDE 1998: 362-369
  26. Stefan Berchtold, Bernhard Ertl, Daniel A. Keim, Hans-Peter Kriegel, Thomas Seidl: Fast Nearest Neighbor Search in High-Dimensional Space. ICDE 1998: 209-218
  27. Paul M. Aoki: Generalizing ``Search'' in Generalized Search Trees (Extended Abstract). ICDE 1998: 380-389
  28. Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel: Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations. EDBT 1998: 216-230
  29. Thomas Seidl, Hans-Peter Kriegel: Efficient User-Adaptable Similarity Search in Large Multimedia Databases. VLDB 1997: 506-515
  30. Stefan Berchtold, Christian Böhm, Bernhard Braunmüller, Daniel A. Keim, Hans-Peter Kriegel: Fast Parallel Similarity Search in Multimedia Databases. SIGMOD Conference 1997: 1-12
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:34:16 2009