ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task.

David D. Lewis: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. SIGIR 1992: 37-50
@inproceedings{DBLP:conf/sigir/Lewis92,
  author    = {David D. Lewis},
  editor    = {Nicholas J. Belkin and
               Peter Ingwersen and
               Annelise Mark Pejtersen},
  title     = {An Evaluation of Phrasal and Clustered Representations on a Text
               Categorization Task},
  booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference
               on Research and Development in Information Retrieval. Copenhagen,
               Denmark, June 21-24, 1992},
  publisher = {ACM},
  year      = {1992},
  isbn      = {0-89791-523-2},
  pages     = {37-50},
  ee        = {db/conf/sigir/Lewis92.html},
  crossref  = {DBLP:conf/sigir/92},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Syntactic phrase indexing and term clustering have been widely explored as text representation techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing languages on a text categorization task, enabling us to study their properties in isolation from query interpretation issues. We show that optimal effectiveness occurs when using only a small proportion of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present results suggesting that traditional term clustering method are unlikely to provide significantly improved text representations. An improved probabilistic text categorization method is also presented.

Copyright © 1992 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Nicholas J. Belkin, Peter Ingwersen, Annelise Mark Pejtersen (Eds.): Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21-24, 1992. ACM 1992, ISBN 0-89791-523-2
Contents BibTeX

Online Edition: ACM Digital Library

Citation page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:40 2009