BIRCH: An Efficient Data Clustering Method for Very Large Databases.
Tian Zhang, Raghu Ramakrishnan, Miron Livny:
BIRCH: An Efficient Data Clustering Method for Very Large Databases.
SIGMOD Conference 1996: 103-114@inproceedings{DBLP:conf/sigmod/ZhangRL96,
author = {Tian Zhang and
Raghu Ramakrishnan and
Miron Livny},
editor = {H. V. Jagadish and
Inderpal Singh Mumick},
title = {BIRCH: An Efficient Data Clustering Method for Very Large Databases},
booktitle = {Proceedings of the 1996 ACM SIGMOD International Conference on
Management of Data, Montreal, Quebec, Canada, June 4-6, 1996},
publisher = {ACM Press},
year = {1996},
pages = {103-114},
ee = {http://doi.acm.org/10.1145/233269.233324, db/conf/sigmod/ZhangRL96.html},
crossref = {DBLP:conf/sigmod/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Finding useful patterns in large datasets has attracted considerable interest
recently, and one of the most widely studied problems in this area is the
identification of clusters, or densely populated regions, in a
multi-dimensional dataset. Prior work does not adequately address the problem
of large datasets and minimization of I/O costs.
This paper presents a data clustering method named BIRCH
(Balanced Iterative Reducing and Clustering using Hierarchies),
and demonstrates that it is especially suitable for very large databases.
BIRCH incrementally and dynamically clusters incoming multi-dimensional
metric data points to try to produce the best quality clustering with the
available resources (i.e., availbale memory and time constraints).
BIRCH can typically find a good clustering with a single scan of the data,
and improve the quality further with a few additional scans. BIRCH is also
the first clustering algorithm proposed in the database area to handle "noise"
(data points that are not part of the underlying pattern) effectively.
We evaluate BIRCH's time/space efficiency, data input
order sensitivity, and clustering quality through several experiments.
We also present a performance comparisons of BIRCH versus
CLARANS, a clustering method proposed recently for large datasets,
and show that BIRCH is consistently superior.
Copyright © 1996 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Online Version (ACM WWW Account required): Full Text in PDF Format
CDROM Version: Load the CDROM "Volume 1 Issue 1, SIGMOD '93-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
H. V. Jagadish, Inderpal Singh Mumick (Eds.):
Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4-6, 1996.
ACM Press 1996 BibTeX
,
SIGMOD Record 25(2),
June 1996
Contents
[Index Terms]
[Full Text in PDF Format, 1451 KB]
References
- [CKS88]
- ...
- [DH73]
- ...
- [DJ80]
- ...
- [EKX95a]
- Martin Ester, Hans-Peter Kriegel, Xiaowei Xu:
A Database Interface for Clustering in Large Spatial Databases.
KDD 1995: 94-99 BibTeX
- [EKX95b]
- Martin Ester, Hans-Peter Kriegel, Xiaowei Xu:
Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification.
SSD 1995: 67-82 BibTeX
- [Fis87]
- Douglas H. Fisher:
Knowledge Acquisition via Incremental Conceptual Clustering.
Machine Learning 2(2): 139-172(1987) BibTeX
- [Fis95]
- ...
- [GG92]
- ...
- [KR90]
- ...
- [Leb87]
- ...
- [Lee81]
- ...
- [Mur83]
- Fionn Murtagh:
A Survey of Recent Advances in Hierarchical Clustering Algorithms.
Comput. J. 26(4): 354-359(1983) BibTeX
- [NH94]
- Raymond T. Ng, Jiawei Han:
Efficient and Effective Clustering Methods for Spatial Data Mining.
VLDB 1994: 144-155 BibTeX
- [Ols93]
- ...
- [ZRL95]
- ...
Referenced by
- Anthony K. H. Tung, Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han:
Constraint-based clustering in large databases.
ICDT 2001: 405-419
- Jeff Edmonds, Jarek Gryz, Dongming Liang, Renée J. Miller:
Mining for Empty Rectangles in Large Data Sets.
ICDT 2001: 174-188
- Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang:
WaveCluster: A Wavelet Based Clustering Approach for Spatial Data in Very Large Databases.
VLDB J. 8(3-4): 289-304(2000)
- Edwin M. Knorr, Raymond T. Ng, V. Tucakov:
Distance-Based Outliers: Algorithms and Applications.
VLDB J. 8(3-4): 237-253(2000)
- David Gibson, Jon M. Kleinberg, Prabhakar Raghavan:
Clustering Categorical Data: An Approach Based on Dynamical Systems.
VLDB J. 8(3-4): 222-236(2000)
- Theodore Johnson, Laks V. S. Lakshmanan, Raymond T. Ng:
The 3W Model and Algebra for Unified Data Mining.
VLDB 2000: 21-32
- Kaushik Chakrabarti, Sharad Mehrotra:
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces.
VLDB 2000: 89-100
- Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim:
Efficient Algorithms for Mining Outliers from Large Data Sets.
SIGMOD Conference 2000: 427-438
- Christopher R. Palmer, Christos Faloutsos:
Density Biased Sampling: An Improved Method for Data Mining and Clustering.
SIGMOD Conference 2000: 82-92
- Carlos Ordonez, Paul Cereghini:
SQLEM: Fast Clustering in SQL using the EM Algorithm.
SIGMOD Conference 2000: 559-570
- Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander:
LOF: Identifying Density-Based Local Outliers.
SIGMOD Conference 2000: 93-104
- Charu C. Aggarwal, Philip S. Yu:
Finding Generalized Projected Clusters In High Dimensional Spaces.
SIGMOD Conference 2000: 70-81
- Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim:
Data Mining and the Web: Past, Present and Future.
Workshop on Web Information and Data Management 1999: 43-47
- Alexander Hinneburg, Daniel A. Keim:
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering.
VLDB 1999: 506-517
- H. V. Jagadish, J. Madar, Raymond T. Ng:
Semantic Compression and Pattern Extraction with Fascicles.
VLDB 1999: 186-198
- Wen-Chi Hou:
A Framework for Statistical Data Mining with Summary Tables.
SSDBM 1999: 14-23
- Apostol Natsev, Rajeev Rastogi, Kyuseok Shim:
WALRUS: A Similarity Retrieval Algorithm for Image Databases.
SIGMOD Conference 1999: 395-406
- Ju-Hong Lee, Deok-Hwan Kim, Chin-Wan Chung:
Multi-dimensional Selectivity Estimation Using Compressed Histogram Information.
SIGMOD Conference 1999: 205-214
- Alin Deutsch, Mary F. Fernández, Dan Suciu:
Storing Semistructured Data with STORED.
SIGMOD Conference 1999: 431-442
- Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander:
OPTICS: Ordering Points To Identify the Clustering Structure.
SIGMOD Conference 1999: 49-60
- Charu C. Aggarwal, Cecilia Magdalena Procopiuc, Joel L. Wolf, Philip S. Yu, Jong Soo Park:
Fast Algorithms for Projected Clustering.
SIGMOD Conference 1999: 61-72
- Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan:
A Framework for Measuring Changes in Data Characteristics.
PODS 1999: 126-137
- Wei Wang, Jiong Yang, Richard R. Muntz:
STING+: An Approach to Active Spatial Data Mining.
ICDE 1999: 116-125
- Sudipto Guha, Rajeev Rastogi, Kyuseok Shim:
ROCK: A Robust Clustering Algorithm for Categorical Attributes.
ICDE 1999: 512-521
- Venkatesh Ganti, Raghu Ramakrishnan, Johannes Gehrke, Allison L. Powell, James C. French:
Clustering Large Datasets in Arbitrary Metric Spaces.
ICDE 1999: 502-511
- Philip S. Yu:
Data Mining and Personalization Technologies.
DASFAA 1999: 6-13
- Pedro Furtado, Henrique Madeira:
Summary Grids: Building Accurate Multidimensional Histograms.
DASFAA 1999: 187-194
- Tadeusz Morzy, Marek Wojciechowski, Maciej Zakrzewicz:
Pattern-Oriented Hierachical Clustering.
ADBIS 1999: 179-190
- Chye-Lin Chee, Hongjun Lu, Hong Tang, C. V. Ramamoorthy:
Adaptive Prefetching and Storage Reorganization In A Log-Structured Storage System.
IEEE Trans. Knowl. Data Eng. 10(5): 824-838(1998)
- Wendy Chang, Gholamhosein Sheikholeslami, Jia Wang, Aidong Zhang:
Data Resource Selection in Distributed Visual Information Systems.
IEEE Trans. Knowl. Data Eng. 10(6): 926-946(1998)
- Jiawei Han:
Towards On-Line Analytical Mining in Large Databases.
SIGMOD Record 27(1): 97-107(1998)
- G. D. Ramkumar, Arun N. Swami:
Clustering Data Without Distance Functions.
IEEE Data Eng. Bull. 21(1): 9-14(1998)
- Gholamhosein Sheikholeslami, Surojit Chatterjee, Aidong Zhang:
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases.
VLDB 1998: 428-439
- Edwin M. Knorr, Raymond T. Ng:
Algorithms for Mining Distance-Based Outliers in Large Datasets.
VLDB 1998: 392-403
- David Gibson, Jon M. Kleinberg, Prabhakar Raghavan:
Clustering Categorical Data: An Approach Based on Dynamical Systems.
VLDB 1998: 311-322
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, Michael Wimmer, Xiaowei Xu:
Incremental Clustering for Mining in a Data Warehousing Environment.
VLDB 1998: 323-333
- Sudipto Guha, Rajeev Rastogi, Kyuseok Shim:
CURE: An Efficient Clustering Algorithm for Large Databases.
SIGMOD Conference 1998: 73-84
- Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan:
Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
SIGMOD Conference 1998: 94-105
- Xiaowei Xu, Martin Ester, Hans-Peter Kriegel, Jörg Sander:
A Distribution-Based Clustering Algorithm for Mining in Large Spatial Databases.
ICDE 1998: 324-331
- Wendy Chang, Deepak Murthy, Aidong Zhang, Tanveer Fathima Syeda-Mahmood:
Global Integration of Visual Databases.
ICDE 1998: 542-549
- Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik:
The New Jersey Data Reduction Report.
IEEE Data Eng. Bull. 20(4): 3-45(1997)
- Wei Wang, Jiong Yang, Richard R. Muntz:
STING: A Statistical Information Grid Approach to Spatial Data Mining.
VLDB 1997: 186-195
- Renée J. Miller, Yuping Yang:
Association Rules over Interval Data.
SIGMOD Conference 1997: 452-461
- Miron Livny, Raghu Ramakrishnan, Kevin S. Beyer, Guangshun Chen, Donko Donjerkovic, Shilpa Lawande, Jussi Myllymaki, R. Kent Wenger:
DEVise: Integrated Querying and Visual Exploration of Large Datasets (Demo Abstract).
SIGMOD Conference 1997: 517-520
- Miron Livny, Raghu Ramakrishnan, Kevin S. Beyer, Guangshun Chen, Donko Donjerkovic, Shilpa Lawande, Jussi Myllymaki, R. Kent Wenger:
DEVise: Integrated Querying and Visualization of Large Datasets.
SIGMOD Conference 1997: 301-312
- Flip Korn, H. V. Jagadish, Christos Faloutsos:
Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences.
SIGMOD Conference 1997: 289-300
- Jiawei Han, Krzysztof Koperski, Nebojsa Stefanovic:
GeoMiner: A System Prototype for Spatial Data Mining.
SIGMOD Conference 1997: 553-556
- Ming-Syan Chen, Jiawei Han, Philip S. Yu:
Data Mining: An Overview from a Database Perspective.
IEEE Trans. Knowl. Data Eng. 8(6): 866-883(1996)
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:40:31 2009