ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Noise Reduction in a Statistical Approach to Text Categorization.

Yiming Yang: Noise Reduction in a Statistical Approach to Text Categorization. SIGIR 1995: 256-263
@inproceedings{DBLP:conf/sigir/Yang95,
  author    = {Yiming Yang},
  editor    = {Edward A. Fox and
               Peter Ingwersen and
               Raya Fidel},
  title     = {Noise Reduction in a Statistical Approach to Text Categorization},
  booktitle = {SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR
               Conference on Research and Development in Information Retrieval.
                Seattle, Washington, USA, July 9-13, 1995 (Special Issue of
               the SIGIR Forum)},
  publisher = {ACM Press},
  year      = {1995},
  isbn      = {0-89791-714-6},
  pages     = {256-263},
  ee        = {db/conf/sigir/Yang95.html},
  crossref  = {DBLP:conf/sigir/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

This paper studies noise reduction for computational efficiency improvements in a statistical learning method for text categorization, the Linear Least Squares Fh (LLSF) mapping. Multiple noise reduction strategies are proposed and evaluated, including: an aggressive removal of "non-informative words" from texts before training; the use of a truncated singular value decomposition to cut off noisy "latent semantic structures" during training; the elimination of non-influential components in the LLSF solution (a word-concept association matrix) after training. Text collections in different domains were used for evaluation. Significant improvements in computational efficiency without losing categorization accuracy were evident in the testing results.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Edward A. Fox, Peter Ingwersen, Raya Fidel (Eds.): SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, Washington, USA, July 9-13, 1995 (Special Issue of the SIGIR Forum). ACM Press 1995, ISBN 0-89791-714-6
Contents BibTeX

Online Edition: ACM Digital Library

Citation page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:49 2009