ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

On the Interrelationship of Dictionary Size and Completeness.

Hubert Hüther: On the Interrelationship of Dictionary Size and Completeness. SIGIR 1990: 313-325
@inproceedings{DBLP:conf/sigir/Huther90,
  author    = {Hubert H{\"u}ther},
  editor    = {Jean-Luc Vidick},
  title     = {On the Interrelationship of Dictionary Size and Completeness},
  booktitle = {SIGIR'90, 13th International Conference on Research and Development
               in Information Retrieval, Brussels, Belgium, 5-7 September 1990,
               Proceedings},
  publisher = {ACM},
  year      = {1990},
  isbn      = {0-89791-408-2},
  pages     = {313-325},
  ee        = {db/conf/sigir/Huther90.html},
  crossref  = {DBLP:conf/sigir/90},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

When dictionaries for specific applications or subject fields are derived from a text collection, the frequency distribution of the terms in the collection gives information about the expected completeness of the dictionary. If only a subset of the terms in the collection is to be included in the dictionary, the completeness of the dictionary can be optimized with respect to dictionary size.

In this paper, formulas for the relationship between the frequency distribution of the terms in the collection and expected dictionary completeness are derived. First we regard one-dimensional dictionaries where the (non-trivial) terms occurring in the texts are to be included in the dictionary. Then we describe the case of two-dimensional dictionaries, which are needed for example for automatic indexing with a controlled vocabulary; here relationships between text terms and descriptors from the prescribed vocabulary have to be stored in the dictionary. For both cases, formulas for the interpolation and extrapolation with respect to different collection sizes are derived.

We give experimental results for one-dimensional dictionaries and show how the completeness can be estimated and optimized.

Copyright © 1990 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Jean-Luc Vidick (Ed.): SIGIR'90, 13th International Conference on Research and Development in Information Retrieval, Brussels, Belgium, 5-7 September 1990, Proceedings. ACM 1990, ISBN 0-89791-408-2
Contents BibTeX

Online Edition: ACM Digital Library

Citation page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:37 2009