ACM SIGMOD Anthology VLDB dblp.uni-trier.de

A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces.

Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB 1998: 194-205
@inproceedings{DBLP:conf/vldb/WeberSB98,
  author    = {Roger Weber and
               Hans-J{\"o}rg Schek and
               Stephen Blott},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {A Quantitative Analysis and Performance Study for Similarity-Search
               Methods in High-Dimensional Spaces},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {194-205},
  ee        = {db/conf/vldb/WeberSB98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

For similarity search in high-dimensional vector spaces (or 'HDVSs'), researchers have proposed a number of new methods (or adaptations of existing methods) based, in the main, on data-space partitioning. However, the performance of these methods generally degrades as dimensionality increases. Although this phenomenon - known as the 'dimensional curse'- is well known, little or no quantitative analysis of the phenomenon is available. In this paper, we provide a detailed analysis of partitioning and clustering techniques for similarity search in HDVSs. We show formally that these methods exhibit linear complexity at high dimensionality, and that existing methods are outperformed on average by a simple sequential scan if the number of dimensions exceeds around 10. Consequently, we come up with an alternative organization based on approximations to make the unavoidable sequential scan as fast as possible. We describe a simple vector approximation scheme, called VA-file, and report on an experimental evaluation of this and of two tree-based index methods (an R*-tree and an X-tree).

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents BibTeX

References

[1]
Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997) BibTeX
[2]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 BibTeX
[3]
Jon Louis Bentley, Jerome H. Friedman: Data Structures for Range Searching. ACM Comput. Surv. 11(4): 397-409(1979) BibTeX
[4]
Stefan Berchtold, Christian Böhm, Bernhard Braunmüller, Daniel A. Keim, Hans-Peter Kriegel: Fast Parallel Similarity Search in Multimedia Databases. SIGMOD Conference 1997: 1-12 BibTeX
[5]
Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. PODS 1997: 78-86 BibTeX
[6]
Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel: Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations. EDBT 1998: 216-230 BibTeX
[7]
Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39 BibTeX
[8]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider: Comparison of Approximations of Complex Objects Used for Approximation-based Query Processing in Spatial Database Systems. ICDE 1993: 40-49 BibTeX
[9]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: Multi-Step Processing of Spatial Joins. SIGMOD Conference 1994: 197-208 BibTeX
[10]
Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435 BibTeX
[11]
...
[12]
...
[13]
...
[14]
Christos Faloutsos: Access Methods for Text. ACM Comput. Surv. 17(1): 49-74(1985) BibTeX
[15]
...
[16]
Christos Faloutsos, Stavros Christodoulakis: Description and Performance Analysis of Signature File Methods for Office Filing. ACM Trans. Inf. Syst. 5(3): 237-257(1987) BibTeX
[17]
Christos Faloutsos, Ibrahim Kamel: Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. PODS 1994: 4-13 BibTeX
[18]
Raphael A. Finkel, Jon Louis Bentley: Quad Trees: A Data Structure for Retrieval on Composite Keys. Acta Inf. 4: 1-9(1974) BibTeX
[19]
Myron Flickner, Harpreet S. Sawhney, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David Steele, Peter Yanker: Query by Image and Video Content: The QBIC System. IEEE Computer 28(9): 23-32(1995) BibTeX
[20]
Jerome H. Friedman, Jon Louis Bentley, Raphael A. Finkel: An Algorithm for Finding Best Matches in Logarithmic Expected Time. ACM Trans. Math. Softw. 3(3): 209-226(1977) BibTeX
[21]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 BibTeX
[22]
Joseph M. Hellerstein, Elias Koutsoupias, Christos H. Papadimitriou: On the Analysis of Indexing Schemes. PODS 1997: 249-256 BibTeX
[23]
Gísli R. Hjaltason, Hanan Samet: Ranking in Spatial Databases. SSD 1995: 83-95 BibTeX
[24]
Norio Katayama, Shin'ichi Satoh: The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries. SIGMOD Conference 1997: 369-380 BibTeX
[25]
King-Ip Lin, H. V. Jagadish, Christos Faloutsos: The TV-Tree: An Index Structure for High-Dimensional Data. VLDB J. 3(4): 517-542(1994) BibTeX
[26]
David B. Lomet, Betty Salzberg: The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance. ACM Trans. Database Syst. 15(4): 625-658(1990) BibTeX
[27]
Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik: The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71(1984) BibTeX
[28]
John T. Robinson: The K-D-B-Tree: A Search Structure For Large Multidimensional Dynamic Indexes. SIGMOD Conference 1981: 10-18 BibTeX
[29]
Hanan Samet: The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
BibTeX
[30]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518 BibTeX
[31]
Robert F. Sproull: Refinements to Nearest-Neighbor Searching in k-Dimensional Trees. Algorithmica 6(4): 579-589(1991) BibTeX
[32]
Markus A. Stricker, Markus Orengo: Similarity of Color Images. Storage and Retrieval for Image and Video Databases (SPIE) 1995: 381-392 BibTeX
[33]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents BibTeX
[34]
...

Referenced by

  1. Stefan Berchtold, Christian Böhm, Daniel A. Keim, Florian Krebs, Hans-Peter Kriegel: On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces. ICDT 2001: 435-449
  2. Charu C. Aggarwal, Alexander Hinneburg, Daniel A. Keim: On the Surprising Behavior of Distance Metrics in High Dimensional Spaces. ICDT 2001: 420-434
  3. Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100
  4. Dimitrios Gunopulos, George Kollios, Vassilis J. Tsotras, Carlotta Domeniconi: Approximating Multi-Dimensional Aggregate Range Queries over Real Attributes. SIGMOD Conference 2000: 463-474
  5. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander: LOF: Identifying Density-Based Local Outliers. SIGMOD Conference 2000: 93-104
  6. Beng Chin Ooi, Kian-Lee Tan, Cui Yu, Stéphane Bressan: Indexing the Edges - A Simple and Yet Efficient Approach to High-Dimensional Indexing. PODS 2000: 166-174
  7. Roger Weber, Klemens Böhm: Trading Quality for Time with Nearest Neighbor Search. EDBT 2000: 21-35
  8. Christian Böhm, Hans-Peter Kriegel: Dynamically Optimizing High-Dimensional Index Structures. EDBT 2000: 36-50
  9. Aristides Gionis, Piotr Indyk, Rajeev Motwani: Similarity Search in High Dimensions via Hashing. VLDB 1999: 518-529
  10. Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:21 2009