Applying Probabilistic Term Weighting to OCR Text in the Case of a Large Alphabetic Library Catalogue.
Elke Mittendorf, Peter Schäuble, Paraic Sheridan:
Applying Probabilistic Term Weighting to OCR Text in the Case of a Large Alphabetic Library Catalogue.
SIGIR 1995: 328-335@inproceedings{DBLP:conf/sigir/MittendorfSS95,
author = {Elke Mittendorf and
Peter Sch{\"a}uble and
Paraic Sheridan},
editor = {Edward A. Fox and
Peter Ingwersen and
Raya Fidel},
title = {Applying Probabilistic Term Weighting to OCR Text in the Case
of a Large Alphabetic Library Catalogue},
booktitle = {SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval.
Seattle, Washington, USA, July 9-13, 1995 (Special Issue of
the SIGIR Forum)},
publisher = {ACM Press},
year = {1995},
isbn = {0-89791-714-6},
pages = {328-335},
ee = {db/conf/sigir/MittendorfSS95.html},
crossref = {DBLP:conf/sigir/95},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
We report on a probabilistic weighting approach to indexing the scanned images of very short documents.
This fully automatic process copes with short and very noisy texts (67% word accuracy) derived from the images by Optical Character Recognition (OCR).
The probabilistic term weighting approach is based on a theoretical proof explaining how the retrieval effectiveness is affected by recognition errors.
We have evaluated our probabilistic weighting approach on a sample of index cards from an alphabetic library catalogue where, on the average, a card contains only 23 terms.
We have demonstrated over 30% improvement in retrieval effectiveness over a conventional weighted retrieval method where the recognition errors are not taken into account.
We also show how we can take advantage of the ordering information of the alphabetic library catalogue.
Copyright © 1995 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Edward A. Fox, Peter Ingwersen, Raya Fidel (Eds.):
SIGIR'95, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, Washington, USA, July 9-13, 1995 (Special Issue of the SIGIR Forum).
ACM Press 1995, ISBN 0-89791-714-6
Contents BibTeX
Citation page
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:50 2009