ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Results of Applying Probabilistic IR to OCR Text.

Kazem Taghva, Julie Borsack, Allen Condit: Results of Applying Probabilistic IR to OCR Text. SIGIR 1994: 202-211
@inproceedings{DBLP:conf/sigir/TaghvaBC94,
  author    = {Kazem Taghva and
               Julie Borsack and
               Allen Condit},
  editor    = {W. Bruce Croft and
               C. J. van Rijsbergen},
  title     = {Results of Applying Probabilistic IR to OCR Text},
  booktitle = {Proceedings of the 17th Annual International ACM-SIGIR Conference
               on Research and Development in Information Retrieval. Dublin,
               Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum)},
  publisher = {ACM/Springer},
  year      = {1994},
  isbn      = {3-540-19889-X},
  pages     = {202-211},
  ee        = {db/conf/sigir/TaghvaBC94.html},
  crossref  = {DBLP:conf/sigir/94},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Character accuracy of optically recognized text is considered a basic measure for evaluating OCR devices. In the broader sense, another fundamental measure of an OCR's goodness is whether its generated text is usable for retrieving information. In this study, we evaluate retrieval effectiveness from OCR text databases using aprobabilistic IR system. We compare these retrieval results to their manually corrected equivalent. We show there is no statistical difference in precision and recall using graded accuracy levels from three OCR devices. However, characteristics of the OCR data have side effects that could cause unstable results with this IR model. In particular, we found individual queries can be greatly affected. Knowing the qualities of OCR text, we compensate for them by applying an automatic post-processing system that improves effectiveness.

Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

W. Bruce Croft, C. J. van Rijsbergen (Eds.): Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum). ACM/Springer 1994, ISBN 3-540-19889-X
Contents BibTeX

Online Edition: ACM Digital Library

Citation page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:46 2009