Pivoted Document Length Normalization.
Amit Singhal, Chris Buckley, Mandar Mitra:
Pivoted Document Length Normalization.
SIGIR 1996: 21-29@inproceedings{DBLP:conf/sigir/SinghalBM96,
author = {Amit Singhal and
Chris Buckley and
Mandar Mitra},
editor = {Hans-Peter Frei and
Donna Harman and
Peter Sch{\"a}uble and
Ross Wilkinson},
title = {Pivoted Document Length Normalization},
booktitle = {Proceedings of the 19th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval, SIGIR'96,
August 18-22, 1996, Zurich, Switzerland (Special Issue of the
SIGIR Forum)},
publisher = {ACM},
year = {1996},
isbn = {0-89791-792-8},
pages = {21-29},
ee = {db/conf/sigir/SinghalBM96.html},
crossref = {DBLP:conf/sigir/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Automatic information retrieval systems have to deal with documents of varying lengths in a text collection.
Document length normalization is used to fairly retrieve documents of all lengths.
In this study, we ohserve that a normalization scheme that retrieves documents of all lengths with similar chances as their likelihood of relevance will outperform another scheme which retrieves documents with chances very different from their likelihood of relevance.
We show that the retrieval probabilities for a particular normalization method deviate systematically from the relevance probabilities across different collections.
We present pivoted normalization, a technique that can be used to modify any normalization function thereby reducing the gap between the relevance and the retrieval probabilities.
Training pivoted normalization on one collection,
we can successfully use it on other (new) text collections, yielding a robust, collection independent normalization technique.
We use the idea of pivoting with the well known cosine normalization function.
We point out some shortcomings of the cosine function and present two new normalization functions -
pivoted unique normalization and pivoted byte size normalization.
Copyright © 1996 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Hans-Peter Frei, Donna Harman, Peter Schäuble, Ross Wilkinson (Eds.):
Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'96, August 18-22, 1996, Zurich, Switzerland (Special Issue of the SIGIR Forum).
ACM 1996, ISBN 0-89791-792-8
Contents BibTeX
Citation page
Referenced by
- Weiyi Meng, King-Lup Liu, Clement T. Yu, Wensheng Wu, Naphtali Rishe:
Estimating the Usefulness of Search Engines.
ICDE 1999: 146-153
- Jeffrey A. Goldman, Douglas Stott Parker Jr., Wesley W. Chu:
Knowledge Discovery in an Earthquake Text Database: Correlation between Significant Earthquakes and the Time of Day.
SSDBM 1997: 12-21
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:50 2009