ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

N-Poisson - Document Modelling.

Eugene L. Margulis: N-Poisson - Document Modelling. SIGIR 1992: 177-189
@inproceedings{DBLP:conf/sigir/Margulis92,
  author    = {Eugene L. Margulis},
  editor    = {Nicholas J. Belkin and
               Peter Ingwersen and
               Annelise Mark Pejtersen},
  title     = {N-Poisson - Document Modelling},
  booktitle = {Proceedings of the 15th Annual International ACM SIGIR Conference
               on Research and Development in Information Retrieval. Copenhagen,
               Denmark, June 21-24, 1992},
  publisher = {ACM},
  year      = {1992},
  isbn      = {0-89791-523-2},
  pages     = {177-189},
  ee        = {db/conf/sigir/Margulis92.html},
  crossref  = {DBLP:conf/sigir/92},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

This paper is a report of a study investigating the validity of the Multiple Poisson (nP) model of word distribution in document collections. An nP distribution is a mixture of n Poisson distributions with different means. We describe a practical algorithm for determining if a certain word is distributed acording to an nP distribution and computing the distribution parameters. The algorithm was applied to every word in four different document collections. It was found that over 70% of frequently occurring words and terms indeed behave according to the nP distributions. The results indicate that the proportion of nP words depends on the collection size, document length and the frequency of the individual words. Most of the nP words recognised are distributed according to the mixture of relatively few single Poisson distributions (two, three or four). There is an indication that the number of single Poisson components in the mixture of relatively few single Poisson distributions (two, three or four). There is an indication that the number of single Poisson components in the mixture depends on the collection frequency of words.

Copyright © 1992 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Nicholas J. Belkin, Peter Ingwersen, Annelise Mark Pejtersen (Eds.): Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Copenhagen, Denmark, June 21-24, 1992. ACM 1992, ISBN 0-89791-523-2
Contents BibTeX

Online Edition: ACM Digital Library

Citation page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:40 2009