ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

A Cost Model for Similarity Queries in Metric Spaces.

Paolo Ciaccia, Marco Patella, Pavel Zezula: A Cost Model for Similarity Queries in Metric Spaces. PODS 1998: 59-68
@inproceedings{DBLP:conf/pods/CiacciaPZ98,
  author    = {Paolo Ciaccia and
               Marco Patella and
               Pavel Zezula},
  title     = {A Cost Model for Similarity Queries in Metric Spaces},
  booktitle = {Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, June 1-3, 1998, Seattle, Washington},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-996-3},
  pages     = {59-68},
  ee        = {http://doi.acm.org/10.1145/275487.275495, db/conf/pods/CiacciaPZ98.html},
  crossref  = {DBLP:conf/pods/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

We consider the problem of estimating CPU (distance computations) and I/O costs for processing range and k-nearest neighbors queries over metric spaces. Unlike the specific case of vector spaces, where information on data distribution has been exploited to derive cost models for predicting the performance of multi-dimensional access methods, in a generic metric space there is no such a possibility, which makes the problem quite different and requires a novel approach. We insist that the distance distribution of objects can be profitably used to solve the problem, and consequently develop a concrete cost model for the M-tree access method. Our results rely on the assumption that the indexed dataset comes from a metric space which is "homogeneous" enough (in a probabilistic sense) to allow reliable cost estimations even if the distance distribution with respect to a specific query object is unknown. We experimentally validate the model over both real and synthetic datasets, and show how the model can be used to tune the M-tree in order to minimize a combination of CPU and I/O costs. Finally, we sketch how the same approach can be applied to derive a cost model for the vp-tree index structure.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ... BibTeX

Printed Edition

Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 1-3, 1998, Seattle, Washington. ACM Press 1998, ISBN 0-89791-996-3
Contents BibTeX

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1094 KB]

References

[1]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 BibTeX
[2]
Alberto Belussi, Christos Faloutsos: Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. VLDB 1995: 299-310 BibTeX
[3]
Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. PODS 1997: 78-86 BibTeX
[4]
Tolga Bozkaya, Z. Meral Özsoyoglu: Distance-Based Indexing for High-Dimensional Metric Spaces. SIGMOD Conference 1997: 357-368 BibTeX
[5]
...
[6]
Sergey Brin: Near Neighbor Search in Large Metric Spaces. VLDB 1995: 574-584 BibTeX
[7]
...
[8]
Tzi-cker Chiueh: Content-Based Image Indexing. VLDB 1994: 582-593 BibTeX
[9]
...
[10]
Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435 BibTeX
[11]
Paolo Ciaccia, Marco Patella, Pavel Zezula: Processing Complex Similarity Queries with Distance-Based Access Methods. EDBT 1998: 9-23 BibTeX
[12]
Christos Faloutsos, Ibrahim Kamel: Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. PODS 1994: 4-13 BibTeX
[13]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 BibTeX
[14]
Joseph M. Hellerstein, Jeffrey F. Naughton, Avi Pfeffer: Generalized Search Trees for Database Systems. VLDB 1995: 562-573 BibTeX
[15]
...
[16]
Ibrahim Kamel, Christos Faloutsos: On Packing R-trees. CIKM 1993: 490-499 BibTeX
[17]
...
[18]
...
[19]
Apostolos Papadopoulos, Yannis Manolopoulos: Performance of Nearest Neighbor Queries in R-Trees. ICDT 1997: 394-408 BibTeX
[20]
Yannis Theodoridis, Timos K. Sellis: A Model for the Prediction of R-tree Performance. PODS 1996: 161-171 BibTeX

Referenced by

  1. Stefan Berchtold, Christian Böhm, Daniel A. Keim, Florian Krebs, Hans-Peter Kriegel: On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces. ICDT 2001: 435-449
  2. Roger Weber, Klemens Böhm: Trading Quality for Time with Nearest Neighbor Search. EDBT 2000: 21-35
  3. Tolga Bozkaya, Z. Meral Özsoyoglu: Indexing Large Metric Spaces for Similarity Search Queries. ACM Trans. Database Syst. 24(3): 361-404(1999)
  4. Aristides Gionis, Piotr Indyk, Rajeev Motwani: Similarity Search in High Dimensions via Hashing. VLDB 1999: 518-529
  5. Pavel Zezula, Pasquale Savino, Giuseppe Amato, Fausto Rabitti: Approximate Similarity Retrieval with M-Trees. VLDB J. 7(4): 275-293(1998)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:34:19 2009