DiSC - OPTICS: Ordering Points To Identify the Clustering Structure

Digital Symposium Collection 2000

OPTICS: Ordering Points To Identify the Clustering Structure

Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander
View Paper (PDF)

Return to Clustering

Abstract

Cluster analysis is a primary method for database mining. It is either used as a stand-alone tool to get insight into the distribution of a data set, e.g. to focus further analysis and data processing, or as a preprocessing step for other algorithms operating on the detected clusters. Almost all of the well-known clustering algorithms require input parameters which are hard to determine but have a significant influence on the clustering result. Furthermore, for many real-data sets there does not even exist a global parameter setting for which the result of the clustering algorithm describes the intrinsic clustering structure accurately. We introduce a new algorithm for the purpose of cluster analysis which does not produce a clustering of a data set explicitly; but instead creates an augmented ordering of the database representing its density-based clustering structure. This cluster-ordering contains information which is equivalent to the density-based clusterings corresponding to a broad range of parameter settings. It is a versatile basis for both automatic and interactive cluster analysis. We show how to automatically and efficiently extract not only ‘traditional’ clustering information (e.g. representative points, arbitrary shaped clusters), but also the intrinsic clustering structure. For medium sized data sets, the cluster-ordering can be represented graphically and for very large data sets, we introduce an appropriate visualization technique. Both are suitable for interactive exploration of the intrinsic clustering structure offering additional insights into the distribution and correlation of the data.

References

Note: References link to DBLP on the Web.

[AGG+98]: Rakesh Agrawal , Johannes Gehrke , Dimitrios Gunopulos , Prabhakar Raghavan : Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. SIGMOD Conference 1998 : 94-105
[AKK 96]: ...
[BKK 96]: Stefan Berchtold , Daniel A. Keim , Hans-Peter Kriegel : The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996 : 28-39
[BKSS 90]: Norbert Beckmann , Hans-Peter Kriegel , Ralf Schneider , Bernhard Seeger : The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. SIGMOD Conference 1990 : 322-331
[CPZ 97]: Paolo Ciaccia , Marco Patella , Pavel Zezula : M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997 : 426-435
[EKSX 96]: Martin Ester , Hans-Peter Kriegel , Jörg Sander , Xiaowei Xu : A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996 : 226-231
[EKS+ 98]: Martin Ester , Hans-Peter Kriegel , Jörg Sander , Michael Wimmer , Xiaowei Xu : Incremental Clustering for Mining in a Data Warehousing Environment. VLDB 1998 : 323-333
[EKX 95]: Martin Ester , Hans-Peter Kriegel , Xiaowei Xu : Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification. SSD 1995 : 67-82
[GM 85]: ...
[GRS 98]: Sudipto Guha , Rajeev Rastogi , Kyuseok Shim : CURE: An Efficient Clustering Algorithm for Large Databases. SIGMOD Conference 1998 : 73-84
[HK 98]: Alexander Hinneburg , Daniel A. Keim : An Efficient Approach to Clustering in Large Multimedia Databases with Noise. KDD 1998 : 58-65
[HT 93]: ...
[Hua 97]: Zhexue Huang : A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining. DMKD 1997 : 0-
[JD 88]: Anil K. Jain , Richard C. Dubes: Algorithms for Clustering Data. Prentice-Hall 1988
[Kei 96a]: Daniel A. Keim : Pixel-oriented Database Visualizations. SIGMOD Record 25(4) : 35-39(1996)
[Kei 96b]: Daniel A. Keim : Databases and Visualization. SIGMOD Conf. 1996 : 543
[KN 96]: Edwin M. Knorr , Raymond T. Ng : Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining. TKDE 8(6) : 884-897(1996)
[KR 90]: L. Kaufman, P. J. Rousseeuw: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley 1990
[Mac 67]: ...
[NH 94]: Raymond T. Ng , Jiawei Han : Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994 : 144-155
[PTVF 92]: William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery: Numerical Recipes in C, 2nd Edition. Cambridge University Press 1992
Contents
[Ric 83]: ...
[Sch 96]: ...
[SE 97]: Erich Schikuta , Martin Erhart : The BANG-Clustering System: Grid-Based Data Analysis. IDA 1997 : 513-524
[SCZ 98]: Gholamhosein Sheikholeslami , Surojit Chatterjee , Aidong Zhang : WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. VLDB 1998 : 428-439
[Sib 73]: R. Sibson : SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method. The Computer Journal 16(1) : 30-34(1973)
[ZRL 96]: Tian Zhang , Raghu Ramakrishnan , Miron Livny : BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conf. 1996 : 103-114

BIBTEX

@inproceedings{DBLP:conf/sigmod/AnkerstBKS99,
  author    = {Mihael Ankerst and
                Markus M. Breunig and
                Hans-Peter Kriegel and
                J{\"o}rg Sander},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {OPTICS: Ordering Points To Identify the Clustering Structure},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {49-60},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },

Copyright(C) 2000 ACM