Compression of Indexes with Full Positional Information in Very Large Text Databases.
Gordon Linoff, Craig Stanfill:
Compression of Indexes with Full Positional Information in Very Large Text Databases.
SIGIR 1993: 88-95@inproceedings{DBLP:conf/sigir/LinoffS93,
author = {Gordon Linoff and
Craig Stanfill},
editor = {Robert Korfhage and
Edie M. Rasmussen and
Peter Willett 0002},
title = {Compression of Indexes with Full Positional Information in Very
Large Text Databases},
booktitle = {Proceedings of the 16th Annual International ACM-SIGIR Conference
on Research and Development in Information Retrieval. Pittsburgh,
PA, USA, June 27 - July 1, 1993},
publisher = {ACM},
year = {1993},
isbn = {0-89791-605-0},
pages = {88-95},
ee = {db/conf/sigir/LinoffS93.html},
crossref = {DBLP:conf/sigir/93},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
This paper describes a combination of compression methods which may be used to reduce the size
of inverted indexes for very large text databases. These methods are Prefix Omission, Run-Length
Encoding, and a novel family of numeric representations called n-s coding. Using these compression
methods on two different text sources (the King James Version of the Bible and a sample of Wall
Street Journal Stories), the compressed index occupies less than 40% of the size of the original text,
even when both stopwords and numbers are included in the index. The decreased time required for
I/O can almost fully compensate for the time needed to uncompress the postings. This research is
part of an effort to handle very large text databases on the CM-5, a massively parallel MIMD
supercomputer.
Copyright © 1993 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Robert Korfhage, Edie M. Rasmussen, Peter Willett (Eds.):
Proceedings of the 16th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Pittsburgh, PA, USA, June 27 - July 1, 1993.
ACM 1993, ISBN 0-89791-605-0
Contents BibTeX
Citation page
Referenced by
- Justin Zobel, Alistair Moffat, Kotagiri Ramamohanarao:
Inverted Files Versus Signature Files for Text Indexing.
ACM Trans. Database Syst. 23(4): 453-490(1998)
- Brian Lowe, Justin Zobel, Ron Sacks-Davis:
A Formal Model for Databases of Structured Text.
DASFAA 1995: 449-456
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:42 2009