ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Access Methods for Text.

Christos Faloutsos: Access Methods for Text. ACM Comput. Surv. 17(1): 49-74(1985)
@article{DBLP:journals/csur/Faloutsos85,
  author    = {Christos Faloutsos},
  title     = {Access Methods for Text},
  journal   = {ACM Comput. Surv.},
  volume    = {17},
  number    = {1},
  year      = {1985},
  pages     = {49-74},
  ee        = {db/journals/csur/Faloutsos85.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

This paper compares text retrieval methods intended for office systems. The operational requirements of the office environment are discussed, and retrieval methods from database systems and from information retrieval systems are examined. We classify these methods and examine the most interesting representatives of each class. Attempts to speed up retrieval with special purpose hardware are also presented, and issues such as approximate string matching and compression are discussed. A qualitative comparison of the examined methods is presented. The signature file method is discussed in more detail.

Copyright © 1985 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 4 Issue 1, Books, VLDB-j, TODS, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

Online Edition: ACM Digital Library

Citation Page

References

[Aho and Corasick 1975]
Alfred V. Aho, Margaret J. Corasick: Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18(6): 333-340(1975) BibTeX
[Aho and Ullman 1979]
Alfred V. Aho, Jeffrey D. Ullman: Optimal Partial-Match Retrieval When Fields Are Independently Specified. ACM Trans. Database Syst. 4(2): 168-179(1979) BibTeX
[Ahuja and Roberts 1980]
Sudhir Ahuja, Charles S. Roberts: An Associative/Parallel Processor for Partial Match Retrieval Using Superimposed Codes. ISCA 1980: 218-227 BibTeX
[Angell et al. 1983]
Richard C. Angell, George E. Freund, Peter Willett: Automatic Spelling Correction Using a Trigram Similarity Measure. Inf. Process. Manage. 19(4): 255-261(1983) BibTeX
[Barton et al. 1974]
Ian J. Barton, Susan E. Creasey, Michael F. Lynch, Michael J. Snell: An Information-Theoretic Approach to Text Searching in Direct Access Systems. Commun. ACM 17(6): 345-350(1974) BibTeX
[Batcher 1968]
Kenneth E. Batcher: Sorting Networks and Their Applications. AFIPS Spring Joint Computing Conference 1968: 307-314 BibTeX
[Bayer and McCreight 1972]
Rudolf Bayer, Edward M. McCreight: Organization and Maintenance of Large Ordered Indices. Acta Inf. 1: 173-189(1972) BibTeX
[Bentley 1975]
Jon Louis Bentley: Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM 18(9): 509-517(1975) BibTeX
[Bird et al. 1977]
...
[Bourne 1963]
...
[Bourne 1977]
Charles P. Bourne: Frequency and impact of spelling errors in bibliographic data bases. Inf. Process. Manage. 13(1): 1-12(1977) BibTeX
[Boyer and Moore 1977]
Robert S. Boyer, J. Strother Moore: A Fast String Searching Algorithm. Commun. ACM 20(10): 762-772(1977) BibTeX
[Brauen 1971]
...
[Burkowski 1982]
Forbes J. Burkowski: A Hardware Hashing Scheme in the Design of a Multiterm String Comparator. IEEE Trans. Computers 31(9): 825-834(1982) BibTeX
[Christodoulakis 1983]
Stavros Christodoulakis: Access Files for Batching Queries in Large Information Systems. ICOD 1983: 127-150 BibTeX
[Christodoulakis 1984]
...
[Christodoulakis and Faloutsos 1984]
Stavros Christodoulakis, Christos Faloutsos: Design Considerations for a Message File Server. IEEE Trans. Software Eng. 10(2): 201-210(1984) BibTeX
[Comer 1979]
Douglas Comer: The Ubiquitous B-Tree. ACM Comput. Surv. 11(2): 121-137(1979) BibTeX
[Cooper 1970]
...
[Croft 1980]
W. Bruce Croft: A model of cluster searching bases on classification. Inf. Syst. 5(3): 189-195(1980) BibTeX
[Damerau 1964]
...
[Dattola 1979]
...
[DeLaBriandais 1959]
...
[Dewey 1950]
...
[Duda and Hart 1973]
...
[Fagin et al. 1979]
Ronald Fagin, Jürg Nievergelt, Nicholas Pippenger, H. Raymond Strong: Extendible Hashing - A Fast Access Method for Dynamic Files. ACM Trans. Database Syst. 4(3): 315-344(1979) BibTeX
[Faloutsos 1985a]
Christos Faloutsos, Stavros Christodoulakis: Design of a Signature File Method that Accounts for Non-Uniform Occurrence and Query Frequencies. VLDB 1985: 165-170 BibTeX
[Faloutsos 1985b]
Christos Faloutsos: Signature files: Design and Performance Comparison of Some Signature Extraction Methods. SIGMOD Conference 1985: 63-82 BibTeX
[Faloutsos and Christodoulakis 1984]
Christos Faloutsos, Stavros Christodoulakis: Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation. ACM Trans. Inf. Syst. 2(4): 267-288(1984) BibTeX
[Files and Huskey 1969]
...
[Fox 1984]
...
[Fredkin 1960]
...
[Friedman etal. 1971]
...
[Fujitani 1984]
Larry Fujitani: Laser Optical Disk: The Coming Revolution in On-Line Storage. Commun. ACM 27(6): 546-554(1984) BibTeX
[Gallager and van Voorhis 1975]
...
[Golomb 1966]
...
[Gravina 1978]
Chris M. Gravina: National Westminster Bank Mass Storage Archiving. IBM Systems Journal 17(4): 344-358(1978) BibTeX
[Gustafson 1971]
...
[Hall and Dowling 1980]
Patrick A. V. Hall, Geoff R. Dowling: Approximate String Matching. ACM Comput. Surv. 12(4): 381-402(1980) BibTeX
[Harrison 1971]
Malcolm C. Harrison: Implementation of the Substring Test by Hashing. Commun. ACM 14(12): 777-779(1971) BibTeX
[Haskin 1981]
...
[Haskin and Hollaar 1983]
Roger L. Haskin, Lee A. Hollaar: Operational Characteristics of a Hardware-Based Pattern Matcher. ACM Trans. Database Syst. 8(1): 15-40(1983) BibTeX
[Haskin and Lorie 1982]
Roger L. Haskin, Raymond A. Lorie: On Extending the Functions of a Relational Database System. SIGMOD Conference 1982: 207-212 BibTeX
[Hollaar 1978]
Lee A. Hollaar: Specialized Merge Processor Networks for Combined Sorted Lists. ACM Trans. Database Syst. 3(3): 272-284(1978) BibTeX
[Hollaar 1979]
...
[Hollaar et al. 1983]
...
[Hopcroft and Ullman 1979]
John E. Hopcroft, Jeffrey D. Ullman: Introduction to Automata Theory, Languages and Computation. Addison-Wesley 1979, ISBN 0-201-02988-X
BibTeX
[IBM 1979]
...
[Johnson 1983]
...
[Kautz and Singleton 1964]
...
[Knott 1971]
Gary D. Knott: Expandable Open Addressing Hash Table Storage and Retrieval. SIGFIDET Workshop 1971: 187-206 BibTeX
[Knott 1975]
Gary D. Knott: Hashing Functions. Comput. J. 18(3): 265-278(1975) BibTeX
[Knuth 1973]
Donald E. Knuth: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley 1973, ISBN 0-201-03803-X
BibTeX
[Knuth et al. 1977]
Donald E. Knuth, James H. Morris Jr., Vaughan R. Pratt: Fast Pattern Matching in Strings. SIAM J. Comput. 6(2): 323-350(1977) BibTeX
[Larson 1978]
Per-Åke Larson: Dynamic Hashing. BIT 18(2): 184-201(1978) BibTeX
[Larson 1983]
Per-Åke Larson: A Method for Speeding Up Text Retrieval. Databases for Business and Office Applications 1983: 117-123 BibTeX
[Lesk 1978]
...
[Lloyd 1980]
John W. Lloyd: Optimal Partial-Match Retrieval. BIT 20(4): 406-413(1980) BibTeX
[Lloyd and Ramamohanarao 1982]
John W. Lloyd, Kotagiri Ramamohanarao: Partial Match Retrieval for Dynamic Files. BIT 22(2): 150-168(1982) BibTeX
[Lowerance and Wagner 1975]
Roy Lowrance, Robert A. Wagner: An Extension of the String-to-String Correction Problem. J. ACM 22(2): 177-183(1975) BibTeX
[Martin 1979]
...
[McIlroy 1982]
...
[McLeod 1981]
Ian A. Macleod: A data base management system for document retrieval applications. Inf. Syst. 6(2): 131-137(1981) BibTeX
[Mooers 1949]
...
[Nievergelt et al. 1984]
Jürg Nievergelt, Hans Hinterberger, Kenneth C. Sevcik: The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Trans. Database Syst. 9(1): 38-71(1984) BibTeX
[Orosz and Tackacs 1956]
...
[Peterson 1980]
James L. Peterson: Computer Programs for Detecting and Correcting Spelling Errors. Commun. ACM 23(12): 676-687(1980) BibTeX
[Pfaltz et al. 1980]
John L. Pfaltz, William J. Berman, Edgar M. Cagley: Partial-Match Retrieval Using Indexed Descriptor Files. Commun. ACM 23(9): 522-528(1980) BibTeX
[Rabitti and Zizka 1984]
...
[Rivest 1976]
Ronald L. Rivest: Partial-Match Retrieval Algorithms. SIAM J. Comput. 5(1): 19-50(1976) BibTeX
[Roberts 1979]
...
[Robinson 1981]
John T. Robinson: The K-D-B-Tree: A Search Structure For Large Multidimensional Dynamic Indexes. SIGMOD Conference 1981: 10-18 BibTeX
[Rocchio 1971]
...
[Rothnie and Lozano 1974]
James B. Rothnie Jr., Tomas Lozano: Attribute Based File Organization in a Paged Memory Environment. Commun. ACM 17(2): 63-69(1974) BibTeX
[Salton 1971a]
...
[Salton 1971b]
...
[Salton 1972]
...
[Salton 1973]
Gerard Salton: Recent Studies in Automatic Text Analysis and Document Retrieval. J. ACM 20(2): 258-278(1973) BibTeX
[Salton 1975]
...
[Salton 1980]
...
[Salton and McGill 1983]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
BibTeX
[Salton and Wong 1978]
Gerard Salton, A. Wong: Generation and Search of Clustered Files. ACM Trans. Database Syst. 3(4): 321-346(1978) BibTeX
[Salton et al. 1983]
Gerard Salton, Edward A. Fox, Harry Wu: Extended Boolean Information Retrieval. Commun. ACM 26(11): 1022-1036(1983) BibTeX
[Schuegraph and Heaps 1976]
Ernst J. Schuegraf, H. S. Heaps: Query processing in a retrospective document retrieval system that uses word fragments as language elements. Inf. Process. Manage. 12(4): 283-292(1976) BibTeX
[Severance 1974]
Dennis G. Severance: Identifier Search Mechanisms: A Survey and Generalized Model. ACM Comput. Surv. 6(3): 175-194(1974) BibTeX
[Severance 1983]
Dennis G. Severance: A practitioner's guide to data base compression - Tutorial. Inf. Syst. 8(1): 51-62(1983) BibTeX
[Sparck-Jones 1972]
...
[Stellhorn 1977]
William H. Stellhorn: An Inverted File Processor for Information Retrieval. IEEE Trans. Computers 26(12): 1258-1267(1977) BibTeX
[Stiassny 1960]
...
[Stonebraker et al.1983]
Michael Stonebraker, Heidi Stettner, Nadene Lynn, Joseph Kalash, Antonin Guttman: Document Processing in a Relational Database System. ACM Trans. Inf. Syst. 1(2): 143-158(1983) BibTeX
[Tsichritzis and Christodoulakis 1983]
Dennis Tsichritzis, Stavros Christodoulakis: Message Files. ACM Trans. Inf. Syst. 1(1): 88-98(1983) BibTeX
[Tsichritzis et al. 1983]
Dennis Tsichritzis, Stavros Christodoulakis, P. Economopoulos, Christos Faloutsos, A. Lee, D. Lee, J. Vandenbroeck, Carson C. Woo: A Multimedia Office Filing System. VLDB 1983: 2-7 BibTeX
[Van Rijsbergen 1971]
C. J. van Rijsbergen: An Algorithm for Information Structuring and Retrieval. Comput. J. 14(4): 407-412(1971) BibTeX
[Van Rijsbergen 1979]
C. J. van Rijsbergen: Information Retrieval. Butterworth 1979, ISBN 0-408-70929-4
BibTeX
[Welch 1984]
Terry A. Welch: A Technique for High-Performance Data Compression. IEEE Computer 17(6): 8-19(1984) BibTeX
[Wing 1979]
...
[Yu and Luk 1977]
Clement T. Yu, W. S. Luk: Analysis of Effectiveness of Retrieval in Clustered Files. J. ACM 24(4): 607-622(1977) BibTeX
[Yu et al. 1982]
Clement T. Yu, K. Lam, Gerard Salton: Term Weighting in Information Retrieval Using the Term Precision Model. J. ACM 29(1): 152-170(1982) BibTeX
[Zahn 1971]
...
[Ziv and Lempel 1977]
Jacob Ziv, Abraham Lempel: A Universal Algorithm for Sequential Data Compression. IEEE Transactions on Information Theory 23(3): 337-343(1977) BibTeX

Referenced by

  1. Himanshu Gupta, Divesh Srivastava: The Data Warehouse of Newsgroups. ICDT 1999: 471-488
  2. Justin Zobel, Alistair Moffat, Kotagiri Ramamohanarao: Inverted Files Versus Signature Files for Text Indexing. ACM Trans. Database Syst. 23(4): 453-490(1998)
  3. Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB 1998: 194-205
  4. Björn Þór Jónsson, Michael J. Franklin, Divesh Srivastava: Interaction of Query Evaluation and Buffer Management for Information Retrieval. SIGMOD Conference 1998: 118-129
  5. Alistair Moffat, Justin Zobel, Neil Sharman: Text Compression for Dynamic Document Databases. IEEE Trans. Knowl. Data Eng. 9(2): 302-313(1997)
  6. Dimitrios Dervos, P. Linardis, Yannis Manolopoulos: S-Index: a Hybrid Structure for Text Retrieval. ADBIS 1997: 204-209
  7. Dimitrios Dervos, P. Linardis, Yannis Manolopoulos: Perfect Encoding: a Signature Method for Text Retrieval. ADBIS 1996: 176-182
  8. Ron Sacks-Davis, Alan J. Kent, Kotagiri Ramamohanarao, James A. Thom, Justin Zobel: Atlas: A Nested Relational Database System for Text Applications. IEEE Trans. Knowl. Data Eng. 7(3): 454-470(1995)
  9. Dik Lun Lee, Young Man Kim, Gaurav Patel: Efficient Signature File Methods for Text Retrieval. IEEE Trans. Knowl. Data Eng. 7(3): 423-435(1995)
  10. Surajit Chaudhuri, Umeshwar Dayal, Tak W. Yan: Join Queries with External Text Sources: Execution and Optimization Techniques. SIGMOD Conference 1995: 410-422
  11. Anthony Tomasic, Hector Garcia-Molina: Issues in Parallel Information Retrieval. IEEE Data Eng. Bull. 17(3): 41-49(1994)
  12. Eric W. Brown, James P. Callan, W. Bruce Croft: Fast Incremental Indexing for Full-Text Information Retrieval. VLDB 1994: 192-202
  13. Christos Faloutsos, M. Ranganathan, Yannis Manolopoulos: Fast Subsequence Matching in Time-Series Databases. SIGMOD Conference 1994: 419-429
  14. Hwan-Seung Yong, Sukho Lee, Hyoung-Joo Kim: Applying Signatures for Forward Traversal Query Processing in Object-Oriented Databases. ICDE 1994: 518-525
  15. Tak W. Yan, Hector Garcia-Molina: Index Structures for Information Filtering Under the Vector Space Model. ICDE 1994: 337-347
  16. Eric W. Brown, James P. Callan, W. Bruce Croft, J. Eliot B. Moss: Supporting Full-Text Information Retrieval with a Persistent Object Store. EDBT 1994: 365-378
  17. Anthony Tomasic, Hector Garcia-Molina: Query Processing and Inverted Indices in Shared-Nothing Document Information Retrieval Systems. VLDB J. 2(3): 243-275(1993)
  18. Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
  19. Anthony Tomasic, Hector Garcia-Molina: Caching and Database Scaling in Distributed Shard-Nothing Information Retrieval Systems. SIGMOD Conference 1993: 129-138
  20. Yoshiharu Ishikawa, Hiroyuki Kitagawa, Nobuo Ohbo: Evaluation of Signature Files as Set Access Facilities in OODBs. SIGMOD Conference 1993: 247-256
  21. Justin Zobel, Alistair Moffat, Ron Sacks-Davis: An Efficient Indexing Technique for Full Text Databases. VLDB 1992: 352-362
  22. Vincent G. Winters: Parallelism For High Performance Query Processing. EDBT 1992: 344-356
  23. Christos Faloutsos, H. V. Jagadish: Hybrid Index Organizations for Text Databases. EDBT 1992: 310-327
  24. Justin Zobel, James A. Thom, Ron Sacks-Davis: Efficiency of Nested Relational Document Database Systems. VLDB 1991: 91-102
  25. Carl S. Hartzman, Carolyn R. Watters: A Relational Approach to Querying Streams. IEEE Trans. Knowl. Data Eng. 2(4): 401-409(1990)
  26. Roger King, Ali Morfeq: Bayan: An Arabic Text Database Management System. SIGMOD Conference 1990: 12-23
  27. Dik Lun Lee, Chun-Wu Roger Leng: A Partitioned Signature File Structure for Multiattribute and Text Retrieval. ICDE 1990: 389-397
  28. Bin Jiang: A Suitable Algorithm for Computing Partial Transitive Closures in Databases. ICDE 1990: 264-271
  29. David Hung-Chang Du, Subbarao Ghanta, Kurt Maly, Suzanne M. Sharrock: An Efficient File Structure for Document Retrieval in the Automated Office Environment. IEEE Trans. Knowl. Data Eng. 1(2): 258-273(1989)
  30. Jae-Woo Chang, Yoon-Joon Lee: Multikey Access Scheme Based on Term Discrimination and Signature Clustering. DASFAA 1989: 211-218
  31. Alan J. Kent, Ron Sacks-Davis, Kotagiri Ramamohanarao: A Superimposed Coding Scheme Based on Multiple Block Descriptor Files for Indexing Very Large Data Bases. VLDB 1988: 351-359
  32. Christos Faloutsos, Raphael Chan: Fast Text Access Methods for Optical and Large Magnetic Disks: Designs and Performance Comparison. VLDB 1988: 280-293
  33. Maria Teresa Pazienza: Using a Semantic Knowledge Base to Support a Natural Language Interface to a Text Database. ER 1988: 455-472
  34. Ron Sacks-Davis, Alan J. Kent, Kotagiri Ramamohanarao: Multikey Access Methods Based on Superimposed Coding Techniques. ACM Trans. Database Syst. 12(4): 655-696(1987)
  35. Christos Faloutsos: Signature files: Design and Performance Comparison of Some Signature Extraction Methods. SIGMOD Conference 1985: 63-82
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:54:44 2009