Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections.

Douglas R. Cutting, Jan O. Pedersen, David R. Karger, John W. Tukey: Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. SIGIR 1992: 318-329
  author    = {Douglas R. Cutting and
               Jan O. Pedersen and
               David R. Karger and
               John W. Tukey},
  editor    = {Nicholas J. Belkin and
               Peter Ingwersen and
               Annelise Mark Pejtersen},
  title     = {Scatter/Gather: A Cluster-based Approach to Browsing Large Document
  booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference
               on Research and Development in Information Retrieval. Copenhagen,
               Denmark, June 21-24, 1992},
  publisher = {ACM},
  year      = {1992},
  isbn      = {0-89791-523-2},
  pages     = {318-329},
  ee        = {db/conf/sigir/CuttingPKT92.html},
  crossref  = {DBLP:conf/sigir/92},
  bibsource = {DBLP,}


Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval.

We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs document clustering as its primary operation. We also present fast (linear time) clustering algorithms which support this interactive browsing paradigm.

Copyright © 1992 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Nicholas J. Belkin, Peter Ingwersen, Annelise Mark Pejtersen (Eds.): Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21-24, 1992. ACM 1992, ISBN 0-89791-523-2
Contents BibTeX

Online Edition: ACM Digital Library

Citation page

Referenced by

  1. Andreas Paepcke, Hector Garcia-Molina, Gerard Rodríguez-Mulà, Junghoo Cho: Beyond Document Similarity: Understanding Value-Based Search and Browsing Technologies. SIGMOD Record 29(1): 80-92(2000)
  2. Rakesh Agrawal, Roberto J. Bayardo Jr., Ramakrishnan Srikant: Athena: Mining-Based Interactive Management of Text Database. EDBT 2000: 365-379
  3. Koji Eguchi, Hidetaka Ito, Akira Kumamoto, Yakichi Kanata: Adaptive and Incremental Query Expansion for Cluster-based Browsing. DASFAA 1999: 25-34
  4. Oren Etzioni: The World-Wide Web: Quagmire or Gold Mine? Commun. ACM 39(11): 65-68(1996)
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:38:41 2009