An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task.
David D. Lewis:
An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task.
SIGIR 1992: 37-50@inproceedings{DBLP:conf/sigir/Lewis92,
author = {David D. Lewis},
editor = {Nicholas J. Belkin and
Peter Ingwersen and
Annelise Mark Pejtersen},
title = {An Evaluation of Phrasal and Clustered Representations on a Text
Categorization Task},
booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval. Copenhagen,
Denmark, June 21-24, 1992},
publisher = {ACM},
year = {1992},
isbn = {0-89791-523-2},
pages = {37-50},
ee = {db/conf/sigir/Lewis92.html},
crossref = {DBLP:conf/sigir/92},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Syntactic phrase indexing and term clustering have been widely explored as text representation
techniques for text retrieval. In this paper we study the properties of phrasal and clustered indexing
languages on a text categorization task, enabling us to study their properties in isolation from query
interpretation issues. We show that optimal effectiveness occurs when using only a small proportion
of the indexing terms available, and that effectiveness peaks at a higher feature set size and lower
effectiveness level for a syntactic phrase indexing than for word-based indexing. We also present
results suggesting that traditional term clustering method are unlikely to provide significantly
improved text representations. An improved probabilistic text categorization method is also
presented.
Copyright © 1992 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Nicholas J. Belkin, Peter Ingwersen, Annelise Mark Pejtersen (Eds.):
Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21-24, 1992.
ACM 1992, ISBN 0-89791-523-2
Contents BibTeX
Citation page
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:40 2009