Optimal Signature Extraction and Information Loss.
Christos Faloutsos, Stavros Christodoulakis:
Optimal Signature Extraction and Information Loss.
ACM Trans. Database Syst. 12(3): 395-428(1987)@article{DBLP:journals/tods/FaloutsosC87,
author = {Christos Faloutsos and
Stavros Christodoulakis},
title = {Optimal Signature Extraction and Information Loss},
journal = {ACM Trans. Database Syst.},
volume = {12},
number = {3},
year = {1987},
pages = {395-428},
ee = {http://doi.acm.org/10.1145/27629.214285, db/journals/tods/FaloutsosC87.html},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Signature files seem to be a promising access method for text
and attributes. According to this method, the documents (or
records) are stored sequentially in one file ("text file"),
while abstractions of the documents ("signatures") are stored
sequentially in another file ("signature file"). In order to
resolve a query, the signature file is scanned first, and many
nonqualifying documents are immediately rejected. We develop
a framework that includes primary key hashing, multiattribute
hashing, and signature files. Our effort is to find the
optimal signature extraction method.
The main contribution of this paper is that we present optimal
and efficient suboptimal algorithms for assigning words to
signatures in several environments. Another contribution is
that we use information theory, and study the relationship of
the false drop probability Fd and the
information that is lost during signature extraction. We give
tight lower bounds on the achievable Fd and
show that a simple relationship holds between the two
quantities in the case of optimal signature extraction with
uniform occurrence and query frequencies. We examine hashing
as a method to map words to signatures (instead of the optimal
way), and show that the same relationship holds between
Fd and loss, indicating that an invariant
may exist between these two quantities for every signature
extraction method.
Copyright © 1987 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
CDROM Version: Load the CDROM "Volume 3 Issue 1, TODS 1976-1990" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 2" and ...
BibTeX
References
- [1]
- Alfred V. Aho, Jeffrey D. Ullman:
Optimal Partial-Match Retrieval When Fields Are Independently Specified.
ACM Trans. Database Syst. 4(2): 168-179(1979) BibTeX
- [2]
- Larry Carter, Mark N. Wegman:
Universal Classes of Hash Functions.
J. Comput. Syst. Sci. 18(2): 143-154(1979) BibTeX
- [3]
- C. C. Chang:
The Study of an Ordered Minimal Perfect Hashing Scheme.
Commun. ACM 27(4): 384-387(1984) BibTeX
- [4]
- Stavros Christodoulakis, Christos Faloutsos:
Design Considerations for a Message File Server.
IEEE Trans. Software Eng. 10(2): 201-210(1984) BibTeX
- [5]
- Christos Faloutsos:
Signature files: Design and Performance Comparison of Some Signature Extraction Methods.
SIGMOD Conference 1985: 63-82 BibTeX
- [6]
- Christos Faloutsos, Stavros Christodoulakis:
Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation.
ACM Trans. Inf. Syst. 2(4): 267-288(1984) BibTeX
- [7]
- ...
- [8]
- ...
- [9]
- ...
- [10]
- ...
- [11]
- ...
- [12]
- ...
- [13]
- ...
- [14]
- Gerhard Jaeschke:
Reciprocal Hashing: A Method for Generating Minimal Perfect Hashing Functions.
Commun. ACM 24(12): 829-833(1981) BibTeX
- [15]
- ...
- [16]
- John W. Lloyd:
Optimal Partial-Match Retrieval.
BIT 20(4): 406-413(1980) BibTeX
- [17]
- John W. Lloyd, Kotagiri Ramamohanarao:
Partial Match Retrieval for Dynamic Files.
BIT 22(2): 150-168(1982) BibTeX
- [18]
- ...
- [19]
- Albert W. Marshall, Ingram Olkin:
Inequalities: Theory of Majorization and Its Application.
Academic Press 1979, ISBN 0-12-473750-1
BibTeX
- [20]
- ...
- [21]
- ...
- [22]
- ...
- [23]
- John L. Pfaltz, William J. Berman, Edgar M. Cagley:
Partial-Match Retrieval Using Indexed Descriptor Files.
Commun. ACM 23(9): 522-528(1980) BibTeX
- [24]
- ...
- [25]
- Ronald L. Rivest:
Partial-Match Retrieval Algorithms.
SIAM J. Comput. 5(1): 19-50(1976) BibTeX
- [26]
- ...
- [27]
- James B. Rothnie Jr., Tomas Lozano:
Attribute Based File Organization in a Paged Memory Environment.
Commun. ACM 17(2): 63-69(1974) BibTeX
- [28]
- Ron Sacks-Davis, Kotagiri Ramamohanarao:
A two level superimposed coding scheme for partial match retrieval.
Inf. Syst. 8(4): 273-289(1983) BibTeX
- [29]
- Dennis G. Severance, Guy M. Lohman:
Differential Files: Their Application to the Maintenance of Large Databases.
ACM Trans. Database Syst. 1(3): 256-267(1976) BibTeX
- [30]
- Renzo Sprugnoli:
Perfect Hashing Functions: A Single Probe Retrieving Method for Static Sets.
Commun. ACM 20(11): 841-850(1977) BibTeX
- [31]
- Dennis Tsichritzis, Stavros Christodoulakis:
Message Files.
ACM Trans. Inf. Syst. 1(1): 88-98(1983) BibTeX
Referenced by
- Paolo Ciaccia, Paolo Tiberio, Pavel Zezula:
Declustering of Key-Based Partitioned Signature Files.
ACM Trans. Database Syst. 21(3): 295-338(1996)
- Dik Lun Lee, Young Man Kim, Gaurav Patel:
Efficient Signature File Methods for Text Retrieval.
IEEE Trans. Knowl. Data Eng. 7(3): 423-435(1995)
- Sam Yuan Sung, H. Zhang:
Signature False Drops due to Combinatorial Error.
DASFAA 1995: 124-130
- Chun-Wu Roger Leng, Dik Lun Lee:
Optimal Weight Assignment for Signature Generation.
ACM Trans. Database Syst. 17(2): 346-373(1992)
- Dik Lun Lee, Chun-Wu Leng:
A Partitioned Signature File Structure for Multiattribute and Text Retrieval.
ICDE 1990: 389-397
- Walter W. Chang, Hans-Jörg Schek:
A Signature Access Method for the Starburst Database System.
VLDB 1989: 145-153
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Tue Jun 24 18:39:02 2008