ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

An Interpretation of Index Term Weighting Schemes Based on Document Components.

K. L. Kwok: An Interpretation of Index Term Weighting Schemes Based on Document Components. SIGIR 1986: 275-283
@inproceedings{DBLP:conf/sigir/Kwok86,
  author    = {K. L. Kwok},
  title     = {An Interpretation of Index Term Weighting Schemes Based on Document
               Components},
  booktitle = {SIGIR'86, Proceedings of the 9th Annual International ACM SIGIR
               Conference on Research and Development in Information Retrieval,
                Pisa, Italy, September 8-10, 1986},
  publisher = {ACM},
  year      = {1986},
  pages     = {275-283},
  ee        = {db/conf/sigir/Kwok86.html},
  crossref  = {DBLP:conf/sigir/86},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

A theory of indexing is presented and is based on viewing a document as constituted of components. A component may be chosen as any run of text unit that can be: (a) judged as to its relevancy property; and (b) considered as independent within the document. By looking at the constituent components of a document in relation to the universe of all components from the collection, we have been able to apply Bayes' decision theory to derive the index term representation for the document, as well as attaching an initial probabilistic weight for each term based on a Principle of Document Self-Recovery. It turns out that different choices of document components, such as a word or a whole abstract, can lead to different term weighting schemes that have been introduced before and are based on probability considerations; specifically, Edmundson and Wyllys' term significance formula, Sparck Jones' inverse document frequency, and later modified by Croft and Harper into the 'combination match' formula. Thus, a unified interpretation of various probabilistic term weighting schemes appears possible.

Copyright © 1986 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

SIGIR'86, Proceedings of the 9th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, September 8-10, 1986. ACM 1986
Contents BibTeX

Online Edition: ACM Digital Library

Citation Page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:29 2009