ACM SIGMOD Anthology TODS dblp.uni-trier.de

Optimal Signature Extraction and Information Loss.

Christos Faloutsos, Stavros Christodoulakis: Optimal Signature Extraction and Information Loss. ACM Trans. Database Syst. 12(3): 395-428(1987)
@article{DBLP:journals/tods/FaloutsosC87,
  author    = {Christos Faloutsos and
               Stavros Christodoulakis},
  title     = {Optimal Signature Extraction and Information Loss},
  journal   = {ACM Trans. Database Syst.},
  volume    = {12},
  number    = {3},
  year      = {1987},
  pages     = {395-428},
  ee        = {http://doi.acm.org/10.1145/27629.214285, db/journals/tods/FaloutsosC87.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Signature files seem to be a promising access method for text and attributes. According to this method, the documents (or records) are stored sequentially in one file ("text file"), while abstractions of the documents ("signatures") are stored sequentially in another file ("signature file"). In order to resolve a query, the signature file is scanned first, and many nonqualifying documents are immediately rejected. We develop a framework that includes primary key hashing, multiattribute hashing, and signature files. Our effort is to find the optimal signature extraction method.

The main contribution of this paper is that we present optimal and efficient suboptimal algorithms for assigning words to signatures in several environments. Another contribution is that we use information theory, and study the relationship of the false drop probability Fd and the information that is lost during signature extraction. We give tight lower bounds on the achievable Fd and show that a simple relationship holds between the two quantities in the case of optimal signature extraction with uniform occurrence and query frequencies. We examine hashing as a method to map words to signatures (instead of the optimal way), and show that the same relationship holds between Fd and loss, indicating that an invariant may exist between these two quantities for every signature extraction method.

Copyright © 1987 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Joint ACM SIGMOD / IEEE Computer Society Anthology

CDROM Version: Load the CDROM "Volume 3 Issue 1, TODS 1976-1990" and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

References

[1]
Alfred V. Aho, Jeffrey D. Ullman: Optimal Partial-Match Retrieval When Fields Are Independently Specified. ACM Trans. Database Syst. 4(2): 168-179(1979) BibTeX
[2]
Larry Carter, Mark N. Wegman: Universal Classes of Hash Functions. J. Comput. Syst. Sci. 18(2): 143-154(1979) BibTeX
[3]
C. C. Chang: The Study of an Ordered Minimal Perfect Hashing Scheme. Commun. ACM 27(4): 384-387(1984) BibTeX
[4]
Stavros Christodoulakis, Christos Faloutsos: Design Considerations for a Message File Server. IEEE Trans. Software Eng. 10(2): 201-210(1984) BibTeX
[5]
Christos Faloutsos: Signature files: Design and Performance Comparison of Some Signature Extraction Methods. SIGMOD Conference 1985: 63-82 BibTeX
[6]
Christos Faloutsos, Stavros Christodoulakis: Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation. ACM Trans. Inf. Syst. 2(4): 267-288(1984) BibTeX
[7]
...
[8]
...
[9]
...
[10]
...
[11]
...
[12]
...
[13]
...
[14]
Gerhard Jaeschke: Reciprocal Hashing: A Method for Generating Minimal Perfect Hashing Functions. Commun. ACM 24(12): 829-833(1981) BibTeX
[15]
...
[16]
John W. Lloyd: Optimal Partial-Match Retrieval. BIT 20(4): 406-413(1980) BibTeX
[17]
John W. Lloyd, Kotagiri Ramamohanarao: Partial Match Retrieval for Dynamic Files. BIT 22(2): 150-168(1982) BibTeX
[18]
...
[19]
Albert W. Marshall, Ingram Olkin: Inequalities: Theory of Majorization and Its Application. Academic Press 1979, ISBN 0-12-473750-1
BibTeX
[20]
...
[21]
...
[22]
...
[23]
John L. Pfaltz, William J. Berman, Edgar M. Cagley: Partial-Match Retrieval Using Indexed Descriptor Files. Commun. ACM 23(9): 522-528(1980) BibTeX
[24]
...
[25]
Ronald L. Rivest: Partial-Match Retrieval Algorithms. SIAM J. Comput. 5(1): 19-50(1976) BibTeX
[26]
...
[27]
James B. Rothnie Jr., Tomas Lozano: Attribute Based File Organization in a Paged Memory Environment. Commun. ACM 17(2): 63-69(1974) BibTeX
[28]
Ron Sacks-Davis, Kotagiri Ramamohanarao: A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8(4): 273-289(1983) BibTeX
[29]
Dennis G. Severance, Guy M. Lohman: Differential Files: Their Application to the Maintenance of Large Databases. ACM Trans. Database Syst. 1(3): 256-267(1976) BibTeX
[30]
Renzo Sprugnoli: Perfect Hashing Functions: A Single Probe Retrieving Method for Static Sets. Commun. ACM 20(11): 841-850(1977) BibTeX
[31]
Dennis Tsichritzis, Stavros Christodoulakis: Message Files. ACM Trans. Inf. Syst. 1(1): 88-98(1983) BibTeX

Referenced by

  1. Paolo Ciaccia, Paolo Tiberio, Pavel Zezula: Declustering of Key-Based Partitioned Signature Files. ACM Trans. Database Syst. 21(3): 295-338(1996)
  2. Dik Lun Lee, Young Man Kim, Gaurav Patel: Efficient Signature File Methods for Text Retrieval. IEEE Trans. Knowl. Data Eng. 7(3): 423-435(1995)
  3. Sam Yuan Sung, H. Zhang: Signature False Drops due to Combinatorial Error. DASFAA 1995: 124-130
  4. Chun-Wu Roger Leng, Dik Lun Lee: Optimal Weight Assignment for Signature Generation. ACM Trans. Database Syst. 17(2): 346-373(1992)
  5. Dik Lun Lee, Chun-Wu Leng: A Partitioned Signature File Structure for Multiattribute and Text Retrieval. ICDE 1990: 389-397
  6. Walter W. Chang, Hans-Jörg Schek: A Signature Access Method for the Starburst Database System. VLDB 1989: 145-153
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Tue Jun 24 18:39:02 2008