ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Applying Probabilistic Term Weighting to OCR Text in the Case of a Large Alphabetic Library Catalogue.

Elke Mittendorf, Peter Schäuble, Paraic Sheridan: Applying Probabilistic Term Weighting to OCR Text in the Case of a Large Alphabetic Library Catalogue. SIGIR 1995: 328-335
@inproceedings{DBLP:conf/sigir/MittendorfSS95,
  author    = {Elke Mittendorf and
               Peter Sch{\"a}uble and
               Paraic Sheridan},
  editor    = {Edward A. Fox and
               Peter Ingwersen and
               Raya Fidel},
  title     = {Applying Probabilistic Term Weighting to OCR Text in the Case
               of a Large Alphabetic Library Catalogue},
  booktitle = {SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR
               Conference on Research and Development in Information Retrieval.
                Seattle, Washington, USA, July 9-13, 1995 (Special Issue of
               the SIGIR Forum)},
  publisher = {ACM Press},
  year      = {1995},
  isbn      = {0-89791-714-6},
  pages     = {328-335},
  ee        = {db/conf/sigir/MittendorfSS95.html},
  crossref  = {DBLP:conf/sigir/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

We report on a probabilistic weighting approach to indexing the scanned images of very short documents. This fully automatic process copes with short and very noisy texts (67% word accuracy) derived from the images by Optical Character Recognition (OCR). The probabilistic term weighting approach is based on a theoretical proof explaining how the retrieval effectiveness is affected by recognition errors. We have evaluated our probabilistic weighting approach on a sample of index cards from an alphabetic library catalogue where, on the average, a card contains only 23 terms. We have demonstrated over 30% improvement in retrieval effectiveness over a conventional weighted retrieval method where the recognition errors are not taken into account. We also show how we can take advantage of the ordering information of the alphabetic library catalogue.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Edward A. Fox, Peter Ingwersen, Raya Fidel (Eds.): SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, Washington, USA, July 9-13, 1995 (Special Issue of the SIGIR Forum). ACM Press 1995, ISBN 0-89791-714-6
Contents BibTeX

Online Edition: ACM Digital Library

Citation page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:50 2009