N-Poisson - Document Modelling.
Eugene L. Margulis:
N-Poisson - Document Modelling.
SIGIR 1992: 177-189@inproceedings{DBLP:conf/sigir/Margulis92,
author = {Eugene L. Margulis},
editor = {Nicholas J. Belkin and
Peter Ingwersen and
Annelise Mark Pejtersen},
title = {N-Poisson - Document Modelling},
booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference
on Research and Development in Information Retrieval. Copenhagen,
Denmark, June 21-24, 1992},
publisher = {ACM},
year = {1992},
isbn = {0-89791-523-2},
pages = {177-189},
ee = {db/conf/sigir/Margulis92.html},
crossref = {DBLP:conf/sigir/92},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
This paper is a report of a study investigating the validity of the Multiple Poisson (nP) model of word
distribution in document collections. An nP distribution is a mixture of n Poisson distributions with
different means. We describe a practical algorithm for determining if a certain word is distributed
acording to an nP distribution and computing the distribution parameters. The algorithm was applied
to every word in four different document collections. It was found that over 70% of frequently
occurring words and terms indeed behave according to the nP distributions. The results indicate that
the proportion of nP words depends on the collection size, document length and the frequency of the
individual words. Most of the nP words recognised are distributed according to the mixture of
relatively few single Poisson distributions (two, three or four). There is an indication that the number
of single Poisson components in the mixture of relatively few single Poisson distributions (two, three
or four). There is an indication that the number of single Poisson components in the mixture depends
on the collection frequency of words.
Copyright © 1992 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Nicholas J. Belkin, Peter Ingwersen, Annelise Mark Pejtersen (Eds.):
Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21-24, 1992.
ACM 1992, ISBN 0-89791-523-2
Contents BibTeX
Citation page
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:40 2009