ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

A Comparison of Classifiers and Document Representations for the Routing Problem.

Hinrich Schütze, David A. Hull, Jan O. Pedersen: A Comparison of Classifiers and Document Representations for the Routing Problem. SIGIR 1995: 229-237
@inproceedings{DBLP:conf/sigir/SchutzeHP95,
  author    = {Hinrich Sch{\"u}tze and
               David A. Hull and
               Jan O. Pedersen},
  editor    = {Edward A. Fox and
               Peter Ingwersen and
               Raya Fidel},
  title     = {A Comparison of Classifiers and Document Representations for
               the Routing Problem},
  booktitle = {SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR
               Conference on Research and Development in Information Retrieval.
                Seattle, Washington, USA, July 9-13, 1995 (Special Issue of
               the SIGIR Forum)},
  publisher = {ACM Press},
  year      = {1995},
  isbn      = {0-89791-714-6},
  pages     = {229-237},
  ee        = {db/conf/sigir/SchutzeHP95.html},
  crossref  = {DBLP:conf/sigir/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

In this paper, we compare learning techniques based on statistical classification to traditional methods of relevance feedback for the document routing problem. We consider three classification techniques which have decision rules that are derived via explicit error minimization linear discriminant analysis, logistic regression, and neural networks. We demonstrate that the classifiers perform 10-15% better than relevance feedback via Rocchio expansion for the TREC-2 and TREC-3 routing tasks.

Error minimization is difficult in high-dimensional feature spaces because the convergence process is slow and the models are prone to overfitting. We use two different strategies, latent semantic indexing and optimal term selection, to reduce the number of features. Our results indicate that features based on latent semantic indexing are more effective for techniques such as linear discriminant analysis and logistic regression, which have no way to protect against overfitting. Neural networks perform equally well with either set of features and can take advantage of the additional information available when both feature sets are used as input.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Edward A. Fox, Peter Ingwersen, Raya Fidel (Eds.): SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, Washington, USA, July 9-13, 1995 (Special Issue of the SIGIR Forum). ACM Press 1995, ISBN 0-89791-714-6
Contents BibTeX

Online Edition: ACM Digital Library

Citation page

Referenced by

  1. Ke Wang, Senqiang Zhou, Shiang Chen Liew: Building Hierarchical Classifiers Using Class Proximity. VLDB 1999: 363-374
  2. Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan: Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies. VLDB J. 7(3): 163-178(1998)
  3. Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, Prabhakar Raghavan: Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases. VLDB 1997: 446-455
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:49 2009