@inproceedings{DBLP:conf/sigir/CuttingPKT92, author = {Douglas R. Cutting and Jan O. Pedersen and David R. Karger and John W. Tukey}, editor = {Nicholas J. Belkin and Peter Ingwersen and Annelise Mark Pejtersen}, title = {Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections}, booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21-24, 1992}, publisher = {ACM}, year = {1992}, isbn = {0-89791-523-2}, pages = {318-329}, ee = {db/conf/sigir/CuttingPKT92.html}, crossref = {DBLP:conf/sigir/92}, bibsource = {DBLP, http://dblp.uni-trier.de} }BibTeX
Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval.
We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs document clustering as its primary operation. We also present fast (linear time) clustering algorithms which support this interactive browsing paradigm.
Copyright © 1992 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.