ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files.

Justin Zobel, Alistair Moffat, Ron Sacks-Davis: Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files. VLDB 1993: 290-301
@inproceedings{DBLP:conf/vldb/ZobelMS93,
  author    = {Justin Zobel and
               Alistair Moffat and
               Ron Sacks-Davis},
  editor    = {Rakesh Agrawal and
               Se{\'a}n Baker and
               David A. Bell},
  title     = {Searching Large Lexicons for Partially Specified Terms using
               Compressed Inverted Files},
  booktitle = {19th International Conference on Very Large Data Bases, August
               24-27, 1993, Dublin, Ireland, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1993},
  isbn      = {1-55860-152-X},
  pages     = {290-301},
  ee        = {db/conf/vldb/ZobelMS93.html},
  crossref  = {DBLP:conf/vldb/93},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

There are many advantages to be gained by storing the lexicon of a full text database in main memory. In this paper we describe how to use a compressed inverted file index to searchsuch a lexicon for entries that match a pattern or partially specified term. This method provides an effective compromise between speed and space, running orders of magnitude faster than brute force search, but requiring less memory than other pattern - matching data structures; indeed, in some cases requiring less memory than would be consumed by a single pointer to each string. The pattern search method is based on text indexing techniques and is a successful adaptation of inverted files to main memory databases.

Copyright © 1993 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Rakesh Agrawal, Seán Baker, David A. Bell (Eds.): 19th International Conference on Very Large Data Bases, August 24-27, 1993, Dublin, Ireland, Proceedings. Morgan Kaufmann 1993, ISBN 1-55860-152-X
Contents BibTeX

References

[1]
Alfred V. Aho, Margaret J. Corasick: Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18(6): 333-340(1975) BibTeX
[2]
Abraham Bookstein, Shmuel T. Klein: Compression of a Set of Correlated Bitmaps. SIGIR 1991: 63-71 BibTeX
[3]
Abraham Bookstein, Shmuel T. Klein, Donald A. Ziff: A Systematic Approach to Compressing a Full-Text Retrieval System. Inf. Process. Manage. 28(6): 795-(1992) BibTeX
[4]
...
[5]
Gordon V. Cormack, R. Nigel Horspool, M. Kaiserswerth: Practical Perfect Hashing. Comput. J. 28(1): 54-58(1985) BibTeX
[6]
...
[7]
Edward A. Fox, Lenwood S. Heath, Qi Fan Chen, Amjad M. Daoud: Practical Minimal Perfect Hash Functions for Large Databases. Commun. ACM 35(1): 105-121(1992) BibTeX
[8]
...
[9]
...
[10]
...
[11]
...
[12]
...
[13]
Edward M. McCreight: A Space-Economical Suffix Tree Construction Algorithm. J. ACM 23(2): 262-272(1976) BibTeX
[14]
Alistair Moffat: Economical Inversion of Large Text Files. Computing Systems 5(2): 125-139(1992) BibTeX
[15]
...
[16]
Alistair Moffat, Justin Zobel: Parameterised Compression for Sparse Bitmaps. SIGIR 1992: 274-285 BibTeX
[17]
Donald R. Morrison: PATRICIA - Practical Algorithm To Retrieve Information Coded in Alphanumeric. J. ACM 15(4): 514-534(1968) BibTeX
[18]
...
[19]
O. Owolabi, D. R. McGregor: Fast Approximate String Matching. Softw., Pract. Exper. 18(4): 387-393(1988) BibTeX
[20]
...
[21]
Jukka Teuhola: A Compression Method for Clustered Bit-Vectors. Inf. Process. Lett. 7(6): 308-311(1978) BibTeX
[22]
Esko Ukkonen: Approximate String Matching with q-grams and Maximal Matches. Theor. Comput. Sci. 92(1): 191-211(1992) BibTeX
[23]
...
[24]
Justin Zobel, Alistair Moffat, Ron Sacks-Davis: An Efficient Indexing Technique for Full Text Databases. VLDB 1992: 352-362 BibTeX

Referenced by

  1. Roy Goldman, Narayanan Shivakumar, Suresh Venkatasubramanian, Hector Garcia-Molina: Proximity Search in Databases. VLDB 1998: 26-37
  2. Hugh E. Williams, Justin Zobel: Indexing Nucleotide Databases for Fast Query Evaluation. EDBT 1996: 275-288
  3. Maxim Martynov, Boris Novikov: An Indexing Algorithm for Text Retrieval. ADBIS 1996: 171-175
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:45:56 2009