ACM SIGMOD Anthology TODS dblp.uni-trier.de

Optimal Weight Assignment for Signature Generation.

Chun-Wu Roger Leng, Dik Lun Lee: Optimal Weight Assignment for Signature Generation. ACM Trans. Database Syst. 17(2): 346-373(1992)
@article{DBLP:journals/tods/LengL92,
  author    = {Chun-Wu Roger Leng and
               Dik Lun Lee},
  title     = {Optimal Weight Assignment for Signature Generation},
  journal   = {ACM Trans. Database Syst.},
  volume    = {17},
  number    = {2},
  year      = {1992},
  pages     = {346-373},
  ee        = {http://doi.acm.org/10.1145/128903.128907, db/journals/tods/LengL92.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Previous work on superimposed coding has been characterized by two aspects. First, it is generally assumed that signatures are generated from logical text blocks of the same size; that is, each block contains the same number of unique terms after stopword and duplicate removal. We call this approach the fixed-size block (FSB) method, since each text block has the same size, as measured by the number of unique terms contained in it. Second, with only a few exceptions [6,7,8,9,17], most previous work has assumed that each term in the text contributes the same number of ones to the signature (i.e., the weight of the term signatures is fixed). The main objective of this paper is to derive an optimal weight assignment that assigns weights to document terms according to their occurrence and query frequencies in order to minimize the false-drop probability. The optimal scheme can account for both uniform and nonuniform occurence and query frequencies, and the signature generation method is still based on hashing rather than on table lookup. Furthermore, a new way of generating signatures, the fixed-weight block (FWB) method, is introduced. FWB controls the weight of every signature to a constant, whereas in FSB, only the expected signature weight is constant. We have shown that FWB has a lower false-drop probability than that of the FSB method, but its storage overhead is slightly higher. Other advantages of FWB are that the optimal weight assignment can be obtained analytically without making unrealistic assumptions and that the formula for computing the term signature weights is simple and efficient.

Copyright © 1992 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Joint ACM SIGMOD / IEEE Computer Society Anthology

CDROM Version: Load the CDROM "Volume 3 Issue 2, TODS 1991-1995, TKDE 1989-1992" and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

Online Edition: ACM Digital Library

[Abstract, Index Terms and Review]
[Full Text in PDF Format, 1485 KB]

References

[1]
...
[2]
Walter W. Chang, Hans-Jörg Schek: A Signature Access Method for the Starburst Database System. VLDB 1989: 145-153 BibTeX
[3]
Uwe Deppisch: S-Tree: A Dynamic Balanced Signature Index for Office Retrieval. SIGIR 1986: 77-87 BibTeX
[4]
Christos Faloutsos: Signature-Based Text Retrieval Methods: A Survey. IEEE Data Eng. Bull. 13(1): 25-32(1990) BibTeX
[5]
Christos Faloutsos, Stavros Christodoulakis: Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation. ACM Trans. Inf. Syst. 2(4): 267-288(1984) BibTeX
[6]
Christos Faloutsos, Stavros Christodoulakis: Design of a Signature File Method that Accounts for Non-Uniform Occurrence and Query Frequencies. VLDB 1985: 165-170 BibTeX
[7]
Christos Faloutsos, Stavros Christodoulakis: Description and Performance Analysis of Signature File Methods for Office Filing. ACM Trans. Inf. Syst. 5(3): 237-257(1987) BibTeX
[8]
Christos Faloutsos, Stavros Christodoulakis: Optimal Signature Extraction and Information Loss. ACM Trans. Database Syst. 12(3): 395-428(1987) BibTeX
[9]
...
[10]
...
[11]
Dik Lun Lee: A Word-Parallel, Bit-Serial Signature Processor for Superimposed Coding. ICDE 1986: 352-359 BibTeX
[12]
Dik Lun Lee, Chun-Wu Leng: Partitioned Signature Files: Design Issues and Performance Evaluation. ACM Trans. Inf. Syst. 7(2): 158-180(1989) BibTeX
[13]
Dik Lun Lee, Chun-Wu Leng: A Partitioned Signature File Structure for Multiattribute and Text Retrieval. ICDE 1990: 389-397 BibTeX
[14]
John L. Pfaltz, William J. Berman, Edgar M. Cagley: Partial-Match Retrieval Using Indexed Descriptor Files. Commun. ACM 23(9): 522-528(1980) BibTeX
[15]
Charles S. Roberts: Partial-Match Via the Method of Superimposed Codes. Proceedings of the IEEE 67(12): 1624-1642(1979) BibTeX
[16]
Ron Sacks-Davis, Kotagiri Ramamohanarao: A two level superimposed coding scheme for partial match retrieval. Inf. Syst. 8(4): 273-289(1983) BibTeX
[17]
Ron Sacks-Davis, Alan J. Kent, Kotagiri Ramamohanarao: Multikey Access Methods Based on Superimposed Coding Techniques. ACM Trans. Database Syst. 12(4): 655-696(1987) BibTeX
[18]
Gerard Salton, Michael McGill: Introduction to Modern Information Retrieval. McGraw-Hill Book Company 1984, ISBN 0-07-054484-0
BibTeX
[19]
Craig Stanfill, Brewster Kahle: Parallel Free-Text Search on the Connection Machine System. Commun. ACM 29(12): 1229-1239(1986) BibTeX
[20]
...
[21]
Dennis Tsichritzis, Stavros Christodoulakis: Message Files. ACM Trans. Inf. Syst. 1(1): 88-98(1983) BibTeX

Referenced by

  1. Dik Lun Lee, Young Man Kim, Gaurav Patel: Efficient Signature File Methods for Text Retrieval. IEEE Trans. Knowl. Data Eng. 7(3): 423-435(1995)
  2. Pavel Zezula, Paolo Ciaccia, Paolo Tiberio: Hamming Filters: A Dynamic Signature File Organization for Parallel Stores. VLDB 1993: 314-327
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Tue Jun 24 18:39:12 2008