ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Query Processing and Inverted Indices in Shared-Nothing Document Information Retrieval Systems.

Anthony Tomasic, Hector Garcia-Molina: Query Processing and Inverted Indices in Shared-Nothing Document Information Retrieval Systems. VLDB J. 2(3): 243-275(1993)
@article{DBLP:journals/vldb/TomasicG93,
  author    = {Anthony Tomasic and
               Hector Garcia-Molina},
  title     = {Query Processing and Inverted Indices in Shared-Nothing Document
               Information Retrieval Systems},
  journal   = {VLDB J.},
  volume    = {2},
  number    = {3},
  year      = {1993},
  pages     = {243-275},
  ee        = {db/journals/vldb/TomasicG93.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

The performance of distributed text document retrieval systems is strongly incluenced by the organization of inverted text. The article compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine those variables that most strongly influence response time and throughput. This leads to a set of design trade-offs over a wide range of hardware configurations and new parallel query processing strategies.

Copyright © 1993 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Key Words

Performance, file organization, query processing, inverted file, inverted index, striping, shared-nothing, full text information retrieval.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 4 Issue 1, Books, VLDB-j, TODS, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

References

[Aalbersberg & Sijstermans 1991]
IJsbrand Jan Aalbersberg, Frans Sijstermans: High-Quality and High-Performance Full-Text Document Retrieval: The Parallel InfoGuide System. PDIS 1991: 142-150 BibTeX
[Burkowski 1990]
Forbes J. Burkowski: Retrieval Performance of a Distributed Text Database Utilizing a Parallel Processor Document Server. DPDS 1990: 71-79 BibTeX
[Chapman & DeFazio 1990]
...
[Chervenak 1990]
...
[DeFazio 1992]
...
[DeFazio & Hull 1991]
...
[Emrath 1983]
...
[Faloutsos 1985]
Christos Faloutsos: Access Methods for Text. ACM Comput. Surv. 17(1): 49-74(1985) BibTeX
[Fedorowicz 1987]
Jane Fedorowicz: Database Performance Evaluation in an Indexed File Environment. ACM Trans. Database Syst. 12(1): 85-110(1987) BibTeX
[Frieder & Siegelmann 1991]
Ophir Frieder, Hava T. Siegelmann: On the Allocation of Documents in Multiprocessor Information Retrieval Systems. SIGIR 1991: 230-239 BibTeX
[Harman & Candela 1990]
Donna Harman, Gerald Candela: Retrieving Records from a Gigabyte of Text on a Mini-Computer Using Statistical Ranking. JASIS 41(8): 581-589(1990) BibTeX
[Hollaar 1992]
...
[Jeong & Omiecinski 1992]
...
[Lin 1991]
Zheng Lin: CAT: An Execution Model for Concurrent Full Text Search. PDIS 1991: 151-158 BibTeX
[Livny 1990]
...
[Matsliach & Shmueli 1991]
Gabriel Matsliach, Oded Shmueli: An Efficient Method for Distributing Search Structures. PDIS 1991: 159-166 BibTeX
[Patterson et al. 1988]
David A. Patterson, Garth A. Gibson, Randy H. Katz: A Case for Redundant Arrays of Inexpensive Disks (RAID). SIGMOD Conference 1988: 109-116 BibTeX
[Rabitti & Zizka 1984]
...
[Salton & McGill 1983]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
BibTeX
[Stanfill 1990]
Craig Stanfill: Partitioned Posting Files: A Parallel Inverted File Structure for Information Retrieval. SIGIR 1990: 413-428 BibTeX
[Tomasic & Garcia-Molina 1993a]
Anthony Tomasic, Hector Garcia-Molina: Caching and Database Scaling in Distributed Shard-Nothing Information Retrieval Systems. SIGMOD Conference 1993: 129-138 BibTeX
[Tomasic & GArcia-Molina 1993b]
Anthony Tomasic, Hector Garcia-Molina: Performance of Inverted Indices in Distributed Text Document Retrieval Systems. PDIS 1993: 8-17 BibTeX
[Trivedi 1982]
...
[Voorhees 1986]
Ellen M. Voorhees: The Efficiency of Inverted Index and Cluster Searches. SIGIR 1986: 164-174 BibTeX
[Weiss 1990]
...
[Wolfram 1991]
...
[Zipf 1949]
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
BibTeX
[Zobel et al. 1992]
Justin Zobel, Alistair Moffat, Ron Sacks-Davis: An Efficient Indexing Technique for Full Text Databases. VLDB 1992: 352-362 BibTeX

Referenced by

  1. Gerhard Weikum: Tutorial on Parallel Database Systems. ICDT 1995: 33-37
  2. Anthony Tomasic, Hector Garcia-Molina: Issues in Parallel Information Retrieval. IEEE Data Eng. Bull. 17(3): 41-49(1994)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Journal: 1992-1995 Copyright © by VLDB Endowment / 1996-... Copyright © by Springer Verlag,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sun May 17 00:31:18 2009