ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Compression of Indexes with Full Positional Information in Very Large Text Databases.

Gordon Linoff, Craig Stanfill: Compression of Indexes with Full Positional Information in Very Large Text Databases. SIGIR 1993: 88-95
@inproceedings{DBLP:conf/sigir/LinoffS93,
  author    = {Gordon Linoff and
               Craig Stanfill},
  editor    = {Robert Korfhage and
               Edie M. Rasmussen and
               Peter Willett 0002},
  title     = {Compression of Indexes with Full Positional Information in Very
               Large Text Databases},
  booktitle = {Proceedings of the 16th Annual International ACM-SIGIR Conference
               on Research and Development in Information Retrieval. Pittsburgh,
               PA, USA, June 27 - July 1, 1993},
  publisher = {ACM},
  year      = {1993},
  isbn      = {0-89791-605-0},
  pages     = {88-95},
  ee        = {db/conf/sigir/LinoffS93.html},
  crossref  = {DBLP:conf/sigir/93},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

This paper describes a combination of compression methods which may be used to reduce the size of inverted indexes for very large text databases. These methods are Prefix Omission, Run-Length Encoding, and a novel family of numeric representations called n-s coding. Using these compression methods on two different text sources (the King James Version of the Bible and a sample of Wall Street Journal Stories), the compressed index occupies less than 40% of the size of the original text, even when both stopwords and numbers are included in the index. The decreased time required for I/O can almost fully compensate for the time needed to uncompress the postings. This research is part of an effort to handle very large text databases on the CM-5, a massively parallel MIMD supercomputer.

Copyright © 1993 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Robert Korfhage, Edie M. Rasmussen, Peter Willett (Eds.): Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Pittsburgh, PA, USA, June 27 - July 1, 1993. ACM 1993, ISBN 0-89791-605-0
Contents BibTeX

Online Edition: ACM Digital Library

Citation page

Referenced by

  1. Justin Zobel, Alistair Moffat, Kotagiri Ramamohanarao: Inverted Files Versus Signature Files for Text Indexing. ACM Trans. Database Syst. 23(4): 453-490(1998)
  2. Brian Lowe, Justin Zobel, Ron Sacks-Davis: A Formal Model for Databases of Structured Text. DASFAA 1995: 449-456
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:42 2009