Digital Review dblp.uni-trier.de

Review - Clustering Categorical Data: An Approach Based on Dynamical Systems.

Raymond T. Ng: Review - Clustering Categorical Data: An Approach Based on Dynamical Systems. ACM SIGMOD Digital Review 1: (1999) BibTeX

Review

With its wide applications, concept clustering has been studied in statistics for decades. In the past few years, with the growing popularity of data mining, there has been a renewed interest to develop clustering algorithms suitable for large, often disk-resident, data sets. This has led to numerous papers on this topic. In fact, too many, critics may even argue. Admittedly, there may be some delta papers. But there are also studies that represent real, significant progress. This paper is a very good example. To the old problem of clustering categorical data, it proposes a very novel solution. Tuples, consisting of only categorical data, are considered similar if the sets of items with which they co-occur have much overlap, in spite of the fact that the tuples themselves never co-occur together. The proposed approach is based on propogation in the style of a dynamical system. The key to such a method is the convergence properties of the propagation. And this is precisely the area in which the paper excels. For a certain way of propagation, called the linear S1 rule, there is a theorem giving a sufficient condition for it to converge. For another way of propagation, called the product rule, there are two theorems showing that there is completeness in the sense that large clustering structures could be detected with high probabilities. As for issues that are hard to be addressed by analysis, such as the speed of convergence, the speed with respect to data set sizes, etc., the paper gives good experimental results. While the experimental results can be further improved, the presented results have clearly shown that the proposed solution is feasible, interesting and promising. As far as clustering papers are concerned, this is exceptional for its clean mathematical foundation and strong analytic results.

Copyright © 1999 by the author(s). Review published with permission.


References

[1]
David Gibson, Jon M. Kleinberg, Prabhakar Raghavan: Clustering Categorical Data: An Approach Based on Dynamical Systems. VLDB 1998: 311-322 BibTeX
BibTeX
Digital Review - DBLP: [Home | Search: Author, Title | Conferences | Journals]
Digital Review: Copyright © by ACM (info@acm.org),
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:57:23 2009