ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD Conference 1998: 94-105
@inproceedings{DBLP:conf/sigmod/AgrawalGGR98,
  author    = {Rakesh Agrawal and
               Johannes Gehrke and
               Dimitrios Gunopulos and
               Prabhakar Raghavan},
  editor    = {Laura M. Haas and
               Ashutosh Tiwary},
  title     = {Automatic Subspace Clustering of High Dimensional Data for Data
               Mining Applications},
  booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference
               on Management of Data, June 2-4, 1998, Seattle, Washington, USA},
  publisher = {ACM Press},
  year      = {1998},
  isbn      = {0-89791-995-5},
  pages     = {94-105},
  ee        = {http://doi.acm.org/10.1145/276304.276314, db/conf/sigmod/AgrawalGGR98.html},
  crossref  = {DBLP:conf/sigmod/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Data mining applications place special requirements on clustering algorithms including: the ability to find clusters embedded in subspaces of high dimensional data, scalability, end-user comprehensibility of the results, non-presumption of any canonical data distribution, and insensitivity to the order of input records. We present CLIQUE, a clustering algorithm that satisfies each of these requirements. CLIQUE identifies dense clusters in subspaces of maximum dimensionality. It generates cluster descriptions in the form of DNF expressions that are minimized for ease of comprehension. It produces identical results irrespective of the order in which input records are presented and does not presume any specific mathematical form for data distribution. Through experiments, we show that CLIQUE efficiently finds accurate clusters in large high dimensional datasets.

Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ... Online Version (ACM WWW Account required): Full Text in PDF Format

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Laura M. Haas, Ashutosh Tiwary (Eds.): SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA. ACM Press 1998, ISBN 0-89791-995-5 BibTeX , SIGMOD Record 27(2), June 1998
Contents

Online Edition: ACM SIGMOD

[Abstract]
[Full Text (Postscript)]

References

[1]
...
[2]
Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman: The Design and Analysis of Computer Algorithms. Addison-Wesley 1974, ISBN 0-201-00029-6
BibTeX
[3]
...
[4]
...
[5]
Roberto J. Bayardo Jr.: Efficiently Mining Long Patterns from Databases. SIGMOD Conference 1998: 85-93 BibTeX
[6]
Stefan Berchtold, Christian Böhm, Daniel A. Keim, Hans-Peter Kriegel: A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space. PODS 1997: 78-86 BibTeX
[7]
...
[8]
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, Shalom Tsur: Dynamic Itemset Counting and Implication Rules for Market Basket Data. SIGMOD Conference 1997: 255-264 BibTeX
[9]
...
[10]
...
[11]
...
[12]
...
[13]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996: 226-231 BibTeX
[14]
Martin Ester, Hans-Peter Kriegel, Xiaowei Xu: A Database Interface for Clustering in Large Spatial Databases. KDD 1995: 94-99 BibTeX
[15]
Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, Ramasamy Uthurusamy (Eds.): Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press 1996, ISBN 0-262-56097-6
Contents BibTeX
[16]
...
[17]
...
[18]
...
[19]
...
[20]
Dimitrios Gunopulos, Roni Khardon, Heikki Mannila, Hannu Toivonen: Data mining, Hypergraph Transversals, and Machine Learning. PODS 1997: 209-216 BibTeX
[21]
Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, Ramakrishnan Srikant: Range Queries in OLAP Data Cubes. SIGMOD Conference 1997: 73-88 BibTeX
[22]
...
[23]
...
[24]
...
[25]
...
[26]
Dao-I Lin, Zvi M. Kedem: Pincer Search: A New Algorithm for Discovering the Maximum Frequent Set. EDBT 1998: 105-119 BibTeX
[27]
...
[28]
Carsten Lund, Mihalis Yannakakis: On the hardness of approximating minimization problems. STOC 1993: 286-293 BibTeX
[29]
...
[30]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 BibTeX
[31]
...
[32]
Renée J. Miller, Yuping Yang: Association Rules over Interval Data. SIGMOD Conference 1997: 452-461 BibTeX
[33]
Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155 BibTeX
[34]
...
[35]
...
[36]
...
[37]
John C. Shafer, Rakesh Agrawal, Manish Mehta: SPRINT: A Scalable Parallel Classifier for Data Mining. VLDB 1996: 544-555 BibTeX
[38]
...
[39]
...
[40]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 BibTeX
[41]
Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145 BibTeX
[42]
...
[43]
...
[44]
...
[45]
Tian Zhang, Raghu Ramakrishnan, Miron Livny: BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conference 1996: 103-114 BibTeX

Referenced by

  1. Anthony K. H. Tung, Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han: Constraint-based clustering in large databases. ICDT 2001: 405-419
  2. Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang: WaveCluster: A Wavelet Based Clustering Approach for Spatial Data in Very Large Databases. VLDB J. 8(3-4): 289-304(2000)
  3. Theodore Johnson, Laks V. S. Lakshmanan, Raymond T. Ng: The 3W Model and Algebra for Unified Data Mining. VLDB 2000: 21-32
  4. Kaushik Chakrabarti, Sharad Mehrotra: Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces. VLDB 2000: 89-100
  5. Carlos Ordonez, Paul Cereghini: SQLEM: Fast Clustering in SQL using the EM Algorithm. SIGMOD Conference 2000: 559-570
  6. Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander: LOF: Identifying Density-Based Local Outliers. SIGMOD Conference 2000: 93-104
  7. Charu C. Aggarwal, Philip S. Yu: Finding Generalized Projected Clusters In High Dimensional Spaces. SIGMOD Conference 2000: 70-81
  8. Edwin M. Knorr, Raymond T. Ng: Finding Intensional Knowledge of Distance-Based Outliers. VLDB 1999: 211-222
  9. Alexander Hinneburg, Daniel A. Keim: Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. VLDB 1999: 506-517
  10. H. V. Jagadish, J. Madar, Raymond T. Ng: Semantic Compression and Pattern Extraction with Fascicles. VLDB 1999: 186-198
  11. Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander: OPTICS: Ordering Points To Identify the Clustering Structure. SIGMOD Conference 1999: 49-60
  12. Charu C. Aggarwal, Cecilia Magdalena Procopiuc, Joel L. Wolf, Philip S. Yu, Jong Soo Park: Fast Algorithms for Projected Clustering. SIGMOD Conference 1999: 61-72
  13. Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan: A Framework for Measuring Changes in Data Characteristics. PODS 1999: 126-137
  14. Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French: Clustering Large Datasets in Arbitrary Metric Spaces. ICDE 1999: 502-511
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:42 2009