Review - Clustering Categorical Data: An Approach Based on Dynamical Systems.
Raymond T. Ng:
Review - Clustering Categorical Data: An Approach Based on Dynamical Systems.
ACM SIGMOD Digital Review 1: (1999) BibTeX
Review
With its wide applications, concept clustering has been studied in
statistics for decades. In the past few years, with the growing
popularity of data mining, there has been a renewed interest to develop
clustering algorithms suitable for large, often disk-resident, data
sets. This has led to numerous papers on this topic. In fact, too many,
critics may even argue. Admittedly, there may be some delta papers. But there
are also studies that represent real, significant progress. This paper is a very
good example. To the old problem of clustering categorical data, it
proposes a very novel solution. Tuples, consisting of only categorical
data, are considered similar if the sets of items with which they
co-occur have much overlap, in spite of the fact that the tuples
themselves never co-occur together. The proposed approach is based on
propogation in the style of a dynamical system. The key to such a
method is the convergence properties of the propagation. And this is
precisely the area in which the paper excels. For a certain way of
propagation, called the linear S1 rule, there is a theorem giving a
sufficient condition for it to converge. For another way of
propagation, called the product rule, there are two theorems showing
that there is completeness in the sense that large clustering
structures could be detected with high probabilities. As for issues
that are hard to be addressed by analysis, such as the speed of
convergence, the speed with respect to data set sizes, etc., the paper
gives good experimental results. While the experimental results can be
further improved, the presented results have clearly shown that the
proposed solution is feasible, interesting and promising. As far as
clustering papers are concerned, this is exceptional for its clean
mathematical foundation and strong analytic results.
Copyright © 1999 by the author(s).
Review published with permission.
References
- [1]
- David Gibson, Jon M. Kleinberg, Prabhakar Raghavan:
Clustering Categorical Data: An Approach Based on Dynamical Systems.
VLDB 1998: 311-322 BibTeX
BibTeX
Digital Review - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
Digital Review: Copyright © by ACM (info@acm.org),
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:57:23 2009