ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Pivoted Document Length Normalization.

Amit Singhal, Chris Buckley, Mandar Mitra: Pivoted Document Length Normalization. SIGIR 1996: 21-29
@inproceedings{DBLP:conf/sigir/SinghalBM96,
  author    = {Amit Singhal and
               Chris Buckley and
               Mandar Mitra},
  editor    = {Hans-Peter Frei and
               Donna Harman and
               Peter Sch{\"a}uble and
               Ross Wilkinson},
  title     = {Pivoted Document Length Normalization},
  booktitle = {Proceedings of the 19th Annual International ACM SIGIR Conference
               on Research and Development in Information Retrieval, SIGIR'96,
               August 18-22, 1996, Zurich, Switzerland (Special Issue of the
               SIGIR Forum)},
  publisher = {ACM},
  year      = {1996},
  isbn      = {0-89791-792-8},
  pages     = {21-29},
  ee        = {db/conf/sigir/SinghalBM96.html},
  crossref  = {DBLP:conf/sigir/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Automatic information retrieval systems have to deal with documents of varying lengths in a text collection. Document length normalization is used to fairly retrieve documents of all lengths. In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance. We show that the retrieval probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections. We present pivoted normalization, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities. Training pivoted normalization on one collection, we can successfully use it on other (new) text collections, yielding a robust, collection independent normalization technique. We use the idea of pivoting with the well known cosine normalization function. We point out some shortcomings of the cosine function and present two new normalization functions - pivoted unique normalization and pivoted byte size normalization.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Hans-Peter Frei, Donna Harman, Peter Schäuble, Ross Wilkinson (Eds.): Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'96, August 18-22, 1996, Zurich, Switzerland (Special Issue of the SIGIR Forum). ACM 1996, ISBN 0-89791-792-8
Contents BibTeX

Online Edition: ACM Digital Library

Citation page

Referenced by

  1. Weiyi Meng, King-Lup Liu, Clement T. Yu, Wensheng Wu, Naphtali Rishe: Estimating the Usefulness of Search Engines. ICDE 1999: 146-153
  2. Jeffrey A. Goldman, Douglas Stott Parker Jr., Wesley W. Chu: Knowledge Discovery in an Earthquake Text Database: Correlation between Significant Earthquakes and the Time of Day. SSDBM 1997: 12-21
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:50 2009