ACM SIGMOD Anthology VLDB dblp.uni-trier.de

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces.

Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435
@inproceedings{DBLP:conf/vldb/CiacciaPZ97,
  author    = {Paolo Ciaccia and
               Marco Patella and
               Pavel Zezula},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {M-tree: An Efficient Access Method for Similarity Search in Metric
               Spaces},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {426-435},
  ee        = {db/conf/vldb/CiacciaPZ97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

A new access method, called M-tree, is proposed to organize and search large data sets from a generic ``metric space", i.e. where object proximity is only defined by a distance function satisfying the positivity, symmetry, and triangle inequality postulates. We detail algorithms for insertion of objects and split management, which keep the M-tree always balanced - several heuristic split alternatives are considered and experimentally evaluated. Algorithms for similarity (range and k-nearest neighbors) queries are also described. Results from extensive experimentation with a prototype system are reported, considering as the performance criteria the number of page I/O's and the number of distance computations. The results demonstrate that the M-tree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Electronic Edition

From CS Dept., University Trier (Germany)

Software

The M-tree Software

References

[AFS93]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 BibTeX
[BKK96]
Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39 BibTeX
[BKSS90]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 BibTeX
[BO97]
Tolga Bozkaya, Z. Meral Özsoyoglu: Distance-Based Indexing for High-Dimensional Metric Spaces. SIGMOD Conference 1997: 357-368 BibTeX
[Bri95]
Sergey Brin: Near Neighbor Search in Large Metric Spaces. VLDB 1995: 574-584 BibTeX
[Chi94]
Tzi-cker Chiueh: Content-Based Image Indexing. VLDB 1994: 582-593 BibTeX
[FEF+94]
Christos Faloutsos, Ron Barber, Myron Flickner, Jim Hafner, Wayne Niblack, Dragutin Petkovic, William Equitz: Efficient and Effective Querying by Image Content. J. Intell. Inf. Syst. 3(3/4): 231-262(1994) BibTeX
[FL95]
Christos Faloutsos, King-Ip Lin: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. SIGMOD Conference 1995: 163-174 BibTeX
[FRM94]
Christos Faloutsos, M. Ranganathan, Yannis Manolopoulos: Fast Subsequence Matching in Time-Series Databases. SIGMOD Conference 1994: 419-429 BibTeX
[Gut84]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 BibTeX
[HNP95]
Joseph M. Hellerstein, Jeffrey F. Naughton, Avi Pfeffer: Generalized Search Trees for Database Systems. VLDB 1995: 562-573 BibTeX
[JD88]
Anil K. Jain, Richard C. Dubes: Algorithms for Clustering Data. Prentice-Hall 1988
BibTeX
[RKV95]
Nick Roussopoulos, Stephen Kelley, Frédéic Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79 BibTeX
[SRF87]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518 BibTeX
[Uhl91]
Jeffrey K. Uhlmann: Satisfying General Proximity/Similarity Queries with Metric Trees. Inf. Process. Lett. 40(4): 175-179(1991) BibTeX
[VM95]
Michael Vassilakopoulos, Yannis Manolopoulos: Dynamic Inverted Quadtree: A Structure for Pictorial Databases. Inf. Syst. 20(6): 483-500(1995) BibTeX
[WBKW96]
Erling Wold, Thom Blum, Douglas Keislar, James Wheaton: Content-Based Classification, Search, and Retrieval of Audio. IEEE MultiMedia 3: 27-36(1996) BibTeX
[ZCR96]
...

Referenced by

  1. Roger Weber, Klemens Böhm: Trading Quality for Time with Nearest Neighbor Search. EDBT 2000: 21-35
  2. Caetano Traina Jr., Agma J. M. Traina, Bernhard Seeger, Christos Faloutsos: Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes. EDBT 2000: 51-65
  3. Jochen Van den Bercken, Martin Schneider, Bernhard Seeger: Plug&Join: An easy-to-use Generic Algorithm for Efficiently Processing Equi and Non-Equi Joins. EDBT 2000: 495-509
  4. Gísli R. Hjaltason, Hanan Samet: Distance Browsing in Spatial Databases. ACM Trans. Database Syst. 24(2): 265-318(1999)
  5. Tolga Bozkaya, Z. Meral Özsoyoglu: Indexing Large Metric Spaces for Similarity Search Queries. ACM Trans. Database Syst. 24(3): 361-404(1999)
  6. Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander: OPTICS: Ordering Points To Identify the Clustering Structure. SIGMOD Conference 1999: 49-60
  7. Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French: Clustering Large Datasets in Arbitrary Metric Spaces. ICDE 1999: 502-511
  8. Kaushik Chakrabarti, Sharad Mehrotra: The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. ICDE 1999: 440-447
  9. Dimitris G. Kapopoulos, Michael Hatzopoulos: The Gr_Tree: The Use of Active Regions in G-Trees. ADBIS 1999: 141-155
  10. Pavel Zezula, Pasquale Savino, Giuseppe Amato, Fausto Rabitti: Approximate Similarity Retrieval with M-Trees. VLDB J. 7(4): 275-293(1998)
  11. Michael Ortega, Yong Rui, Kaushik Chakrabarti, Kriengkrai Porkaew, Sharad Mehrotra, Thomas S. Huang: Supporting Ranked Boolean Similarity Queries in MARS. IEEE Trans. Knowl. Data Eng. 10(6): 905-925(1998)
  12. Flip Korn, Nikolaos Sidiropoulos, Christos Faloutsos, Eliot Siegel, Zenon Protopapas: Fast and Effective Retrieval of Medical Tumor Shapes. IEEE Trans. Knowl. Data Eng. 10(6): 889-904(1998)
  13. Mihael Ankerst, Hans-Peter Kriegel, Thomas Seidl: A Multistep Approach for Shape Similarity Search in Image Databases. IEEE Trans. Knowl. Data Eng. 10(6): 996-1004(1998)
  14. Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB 1998: 194-205
  15. Martin Ester, Hans-Peter Kriegel, Jörg Sander, Michael Wimmer, Xiaowei Xu: Incremental Clustering for Mining in a Data Warehousing Environment. VLDB 1998: 323-333
  16. David T. Kao, R. Daniel Bergeron, Ted M. Sparr: Efficient Proximity Search in Multivariate Data. SSDBM 1998: 145-154
  17. Thomas Seidl, Hans-Peter Kriegel: Optimal Multi-Step k-Nearest Neighbor Search. SIGMOD Conference 1998: 154-165
  18. Paolo Ciaccia, Marco Patella, Pavel Zezula: A Cost Model for Similarity Queries in Metric Spaces. PODS 1998: 59-68
  19. Paolo Ciaccia, Marco Patella, Pavel Zezula: Processing Complex Similarity Queries with Distance-Based Access Methods. EDBT 1998: 9-23
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:17 2009