ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets.

Christos Faloutsos, King-Ip Lin: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. SIGMOD Conference 1995: 163-174
@inproceedings{DBLP:conf/sigmod/FaloutsosL95,
  author    = {Christos Faloutsos and
               King-Ip Lin},
  editor    = {Michael J. Carey and
               Donovan A. Schneider},
  title     = {FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization
               of Traditional and Multimedia Datasets},
  booktitle = {Proceedings of the 1995 ACM SIGMOD International Conference on
               Management of Data, San Jose, California, May 22-25, 1995},
  publisher = {ACM Press},
  year      = {1995},
  pages     = {163-174},
  ee        = {http://doi.acm.org/10.1145/223784.223812, db/conf/sigmod/sigmod95-12.html},
  crossref  = {DBLP:conf/sigmod/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

A very promising idea for fast searching in traditional and multimedia databases is to map objects into points in k-d space, using k feature-extraction functions, provided by a domain expert [Jagadish, SIGMOD Conf. 1991: 208-217]. Thus, we can subsequently use highly fine-tuned spatial access methods (SAMs), to answer several types of queries, including the `Query By Example' type (which translates to a range query); the `all pairs' query (which translates to a spatial join [Brinkhoff, SIGMOD Conf. 1994: 197-208]); the nearest-neighbor or best-match query, etc.

However, designing feature extraction functions can be hard. It is relatively easier for a domain expert to assess the similarity/distance of two objects. Given only the distance information though, it is not obvious how to map objects into points.

This is exactly the topic of this paper. We describe a fast algorithm to map objects into points in some k-dimensional space (k is user-defined), such that the dis-similarities are preserved. There are two benefits from this mapping: (a) efficient retrieval, in conjunction with a SAM, as discussed before and (b) visualization and data-mining: the objects can now be plotted as points in 2-d or 3-d space, revealing potential clusters, correlations among attributes and other regularities that data-mining is looking for.

We introduce an older method from pattern recognition, namely, Multi-Dimensional Scaling (MDS); although unsuitable for indexing, we use it as yardstick for our method. Then, we propose a much faster algorithm to solve the problem in hand, while in addition it allows for indexing. Experiments on real and synthetic data indeed show that the proposed algorithm is significantly faster than MDS, (being linear, as opposed to quadratic, on the database size N), while it manages to preserve distances and the overall structure of the data-set.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

Online Version (ACM WWW Account required): Full Text in PDF Format

CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Michael J. Carey, Donovan A. Schneider (Eds.): Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, San Jose, California, May 22-25, 1995. ACM Press 1995 BibTeX , SIGMOD Record 24(2), June 1995
Contents

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1179 KB]

References

[1]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 BibTeX
[2]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
[3]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[4]
...
[5]
Manish Arya, William F. Cody, Christos Faloutsos, Joel E. Richardson, Arthur Toya: QBISM: A Prototype 3-D Medical Image Database System. IEEE Data Eng. Bull. 16(1): 38-42(1993) BibTeX
[6]
Ricardo A. Baeza-Yates, Walter Cunto, Udi Manber, Sun Wu: Proximity Matching Using Fixed-Queries Trees. CPM 1994: 198-212 BibTeX
[7]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990: 322-331 BibTeX
[8]
Thomas Brinkhoff, Hans-Peter Kriegel, Ralf Schneider, Bernhard Seeger: Multi-Step Processing of Spatial Joins. SIGMOD Conference 1994: 197-208 BibTeX
[9]
Thomas Brinkhoff, Hans-Peter Kriegel, Bernhard Seeger: Efficient Processing of Spatial Joins Using R-Trees. SIGMOD Conference 1993: 237-246 BibTeX
[10]
Walter A. Burkhard, Robert M. Keller: Some Approaches to Best-Match File Searching. Commun. ACM 16(4): 230-236(1973) BibTeX
[11]
...
[12]
...
[13]
Susan T. Dumais: Latent Semantic Indexing (LSI): TREC-3 Report. TREC 1994: 105-115 BibTeX
[14]
Christos Faloutsos, Shari Roseman: Fractals for Secondary Key Retrieval. PODS 1989: 247-252 BibTeX
[15]
...
[16]
Peter W. Foltz, Susan T. Dumais: Personalized Information Delivery: An Analysis of Information Filtering Methods. Commun. ACM 35(12): 51-60(1992) BibTeX
[17]
...
[18]
Irene Gargantini: An Effective Way to Represent Quadtrees. Commun. ACM 25(12): 905-910(1982) BibTeX
[19]
...
[20]
Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD Conference 1984: 47-57 BibTeX
[21]
...
[22]
...
[23]
H. V. Jagadish: Linear Clustering of Objects with Multiple Atributes. SIGMOD Conference 1990: 332-342 BibTeX
[24]
H. V. Jagadish: Spatial Search with Polyhedra. ICDE 1990: 311-319 BibTeX
[25]
H. V. Jagadish: A Retrieval Technique for Similar Shapes. SIGMOD Conference 1991: 208-217 BibTeX
[26]
...
[27]
Ibrahim Kamel, Christos Faloutsos: Hilbert R-tree: An Improved R-tree using Fractals. VLDB 1994: 500-509 BibTeX
[28]
...
[29]
...
[30]
Karen Kukich: Techniques for Automatically Correcting Words in Text. ACM Comput. Surv. 24(4): 377-439(1992) BibTeX
[31]
David B. Lomet, Betty Salzberg: The hB-Tree: A Multiattribute Indexing Method with Good Guaranteed Performance. ACM Trans. Database Syst. 15(4): 625-658(1990) BibTeX
[32]
Fionn Murtagh: A Survey of Recent Advances in Hierarchical Clustering Algorithms. Comput. J. 26(4): 354-359(1983) BibTeX
[33]
A. Desai Narasimhalu, Stavros Christodoulakis: Multimedia Information Systems: The Unfolding of a Reality (Guest Editors' Introduction). IEEE Computer 24(10): 6-8(1991) BibTeX
[34]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 BibTeX
[35]
Wayne Niblack, Ron Barber, William Equitz, Myron Flickner, Eduardo H. Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, Gabriel Taubin: The QBIC Project: Querying Images by Content, Using Color, Texture, and Shape. Storage and Retrieval for Image and Video Databases (SPIE) 1993: 173-187 BibTeX
[36]
Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik: The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71(1984) BibTeX
[37]
Jack A. Orenstein: Spatial Query Processing in an Object-Oriented Database System. SIGMOD Conference 1986: 326-336 BibTeX
[38]
Jack A. Orenstein: A Comparison of Spatial Query Processing Techniques for Native and Parameter Spaces. SIGMOD Conference 1990: 343-352 BibTeX
[39]
...
[40]
...
[41]
...
[42]
Nick Roussopoulos, Stephen Kelley, Frédéic Vincent: Nearest Neighbor Queries. SIGMOD Conference 1995: 71-79 BibTeX
[43]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
BibTeX
[44]
...
[45]
Timos K. Sellis, Nick Roussopoulos, Christos Faloutsos: The R+-Tree: A Dynamic Index for Multi-Dimensional Objects. VLDB 1987: 507-518 BibTeX
[46]
Marvin B. Shapiro: The Choice of Reference Points in Best-Match File Searching. Commun. ACM 20(5): 339-343(1977) BibTeX
[47]
Dennis Shasha, Jason Tsong-Li Wang: New Techniques for Best-Match Retrieval. ACM Trans. Inf. Syst. 8(2): 140-158(1990) BibTeX
[48]
...
[49]
...
[50]
...
[51]
...
[52]
...
[53]
...
[54]
...
[55]
...

Referenced by

  1. Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100
  2. Charu C. Aggarwal, Philip S. Yu: Finding Generalized Projected Clusters In High Dimensional Spaces. SIGMOD Conference 2000: 70-81
  3. Gísli R. Hjaltason, Hanan Samet: Distance Browsing in Spatial Databases. ACM Trans. Database Syst. 24(2): 265-318(1999)
  4. Alexander Hinneburg, Daniel A. Keim: Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. VLDB 1999: 506-517
  5. H. V. Jagadish, J. Madar, Raymond T. Ng: Semantic Compression and Pattern Extraction with Fascicles. VLDB 1999: 186-198
  6. Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French: Clustering Large Datasets in Arbitrary Metric Spaces. ICDE 1999: 502-511
  7. Beng Chin Ooi, Kian-Lee Tan, Tat-Seng Chua, Wynne Hsu: Fast Image Retrieval Using Color-Spatial Information. VLDB J. 7(2): 115-128(1998)
  8. Mihael Ankerst, Hans-Peter Kriegel, Thomas Seidl: A Multistep Approach for Shape Similarity Search in Image Databases. IEEE Trans. Knowl. Data Eng. 10(6): 996-1004(1998)
  9. King Lum Cheung, Ada Wai-Chee Fu: Enhanced Nearest Neighbour Search on the R-tree. SIGMOD Record 27(3): 16-21(1998)
  10. Yoshiharu Ishikawa, Ravishankar Subramanya, Christos Faloutsos: MindReader: Querying Databases Through Multiple Examples. VLDB 1998: 218-227
  11. Thomas Seidl, Hans-Peter Kriegel: Optimal Multi-Step k-Nearest Neighbor Search. SIGMOD Conference 1998: 154-165
  12. Byoung-Kee Yi, H. V. Jagadish, Christos Faloutsos: Efficient Retrieval of Similar Time Sequences Under Time Warping. ICDE 1998: 201-208
  13. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  14. Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435
  15. Banu Özden, Rajeev Rastogi, Abraham Silberschatz: Multimedia Support for Databases. PODS 1997: 1-11
  16. Kyuseok Shim, Ramakrishnan Srikant, Rakesh Agrawal: High-Dimensional Similarity Joins. ICDE 1997: 301-311
  17. Ming-Syan Chen, Jiawei Han, Philip S. Yu: Data Mining: An Overview from a Database Perspective. IEEE Trans. Knowl. Data Eng. 8(6): 866-883(1996)
  18. Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39
  19. Hagit Shatkay, Stanley B. Zdonik: Approximate Queries and Representations for Large Data Sequences. ICDE 1996: 536-545
  20. Chung-Sheng Li, Philip S. Yu, Vittorio Castelli: HierarchyScan: A Hierarchical Similarity Search Algorithm for Databases of Long Sequences. ICDE 1996: 546-553
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:25 2009