Concepts and Effectiveness of the Cover-Coefficient-Based Clustering Methodology for Text Databases.

Fazli Can, Esen A. Ozkarahan: Concepts and Effectiveness of the Cover-Coefficient-Based Clustering Methodology for Text Databases. ACM Trans. Database Syst. 15(4): 483-517(1990)
  author    = {Fazli Can and
               Esen A. Ozkarahan},
  title     = {Concepts and Effectiveness of the Cover-Coefficient-Based Clustering
               Methodology for Text Databases},
  journal   = {ACM Trans. Database Syst.},
  volume    = {15},
  number    = {4},
  year      = {1990},
  pages     = {483-517},
  ee        = {, db/journals/tods/CanO90.html},
  bibsource = {DBLP,}


A new algorithm for document clustering is introduced. The base concept of the algorithm, the cover coefficient (CC) concept, provides a means of estimating the number of clusters within a document database and relates indexing and clustering analytically. The CC concept is used also to identify the cluster seeds and to form clusters with these seeds. It is shown that the complexity of the clustering process is very low. The retrieval experiments show that the information-retrieval effectiveness of the algorithm is compatible with a very demanding complete linkage clustering method that is known to have good retrieval performance. The experiments also show that the algorithm is 15.1 to 63.5 (with an average of 47.5) percent better than four other clustering algorithms in cluster-based information retrieval. The experiments have validated the indexing-clustering relationships and the complexity of the algorithm and have shown improvements in retrieval effectiveness. In the experiments, two document databases are used: TODS214 and INSPEC. The latter is a common database with 12,684 documents.

Copyright © 1990 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

Joint ACM SIGMOD / IEEE Computer Society Anthology

CDROM Version: Load the CDROM "Volume 3 Issue 1, TODS 1976-1990" and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX


Fazli Can, Esen A. Ozkarahan: A Clustering Scheme. SIGIR 1983: 115-121 BibTeX
Fazli Can, Esen A. Ozkarahan: Concepts of the Cover-Coefficient-Based Clustering Methodology. SIGIR 1985: 204-211 BibTeX
Abdelmoula El-Hamdouchi, Peter Willett: Comparison of Hierarchie Agglomerative Clustering Methods for Document Retrieval. Comput. J. 32(3): 220-227(1989) BibTeX
Anil K. Jain, Richard C. Dubes: Algorithms for Clustering Data. Prentice-Hall 1988
Esen A. Ozkarahan, Fazli Can: An Automatic and Tunable Document Indexing System. SIGIR 1986: 234-243 BibTeX
Edie M. Rasmussen, Peter Willett: Non-Hierarchic Document Clustering Using the ICL Distributed Array Processor. SIGIR 1987: 132-139 BibTeX
Gerard Salton: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley 1989, ISBN 0-201-12227-8
Gerard Salton, Chris Buckley: Term-Weighting Approaches in Automatic Text Retrieval. Inf. Process. Manage. 24(5): 513-523(1988) BibTeX
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
Gerard Salton, A. Wong: Generation and Search of Clustered Files. ACM Trans. Database Syst. 3(4): 321-346(1978) BibTeX
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
Ellen M. Voorhees: The Cluster Hypothesis Revisited. SIGIR 1985: 188-196 BibTeX
Ellen M. Voorhees: The Efficiency of Inverted Index and Cluster Searches. SIGIR 1986: 164-174 BibTeX
S. Bing Yao: Approximating the Number of Accesses in Database Organizations. Commun. ACM 20(4): 260-261(1977) BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Tue Jun 24 18:39:09 2008