![]() |
![]() |
@inproceedings{DBLP:conf/sigir/Huther90,
author = {Hubert H{\"u}ther},
editor = {Jean-Luc Vidick},
title = {On the Interrelationship of Dictionary Size and Completeness},
booktitle = {SIGIR'90, 13th International Conference on Research and Development
in Information Retrieval, Brussels, Belgium, 5-7 September 1990,
Proceedings},
publisher = {ACM},
year = {1990},
isbn = {0-89791-408-2},
pages = {313-325},
ee = {db/conf/sigir/Huther90.html},
crossref = {DBLP:conf/sigir/90},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
When dictionaries for specific applications or subject fields are derived from a text collection, the frequency distribution of the terms in the collection gives information about the expected completeness of the dictionary. If only a subset of the terms in the collection is to be included in the dictionary, the completeness of the dictionary can be optimized with respect to dictionary size.
In this paper, formulas for the relationship between the frequency distribution of the terms in the collection and expected dictionary completeness are derived. First we regard one-dimensional dictionaries where the (non-trivial) terms occurring in the texts are to be included in the dictionary. Then we describe the case of two-dimensional dictionaries, which are needed for example for automatic indexing with a controlled vocabulary; here relationships between text terms and descriptors from the prescribed vocabulary have to be stored in the dictionary. For both cases, formulas for the interpolation and extrapolation with respect to different collection sizes are derived.
We give experimental results for one-dimensional dictionaries and show how the completeness can be estimated and optimized.
Copyright © 1990 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.