ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Space/Time Trade-offs in Hash Coding with Allowable Errors.

Burton H. Bloom: Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM 13(7): 422-426(1970)
@article{DBLP:journals/cacm/Bloom70,
  author    = {Burton H. Bloom},
  title     = {Space/Time Trade-offs in Hash Coding with Allowable Errors},
  journal   = {Commun. ACM},
  volume    = {13},
  number    = {7},
  year      = {1970},
  pages     = {422-426},
  ee        = {db/journals/cacm/Bloom70.html},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

In this paper trade-offs among certain computational factors in hash coding are analyzed. The paradigm problem considered is that of testing a series of messages one-by-one for membership in a given set of messages. Two new hash-coding methods are examined and compared with a particular conventional hash-coding method. The computational factors considered are the size of the hash area (space), the time required to identify a message as a nonmember of the given set (reject time), and an allowable error frequency.

The new methods are intended to reduce the amount of space required to contain the hash-coded information from that associated with conventional methods. The reduction in space is accomplished by exploiting the possibility that a small fraction of errors of commission may be tolerable in some applications, in particular, applications in which a large amount of data is involved and a core resident hash area is consequently not feasible using conventional methods.

In such applications, it is envisaged that overall performance could be improved by using a smaller core resident hash area in conjunction with the new methods and, when necessary, by using some secondary and perhaps time-consuming test to "catch" the small fraction of errors associated with new methods. An example is discussed which illustrates possible areas of application for the new methods.

Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

Copyright © 1970 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 4 Issue 1, Books, VLDB-j, TODS, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

References

[1]
...
[2]
...
[3]
...

Referenced by

  1. Alfons Kemper, Donald Kossmann, Christian Wiesner: Generalised Hash Teams for Join and Group-by. VLDB 1999: 30-41
  2. Charles Bontempo, George Zagelow: The IBM Data Warehouse Architecture. Commun. ACM 41(9): 38-48(1998)
  3. Theodore Johnson: Coarse Indices for a Tape-Based Data Warehouse. ICDE 1998: 231-240
  4. Praveen Seshadri, Joseph M. Hellerstein, Hamid Pirahesh, T. Y. Cliff Leung, Raghu Ramakrishnan, Divesh Srivastava, Peter J. Stuckey, S. Sudarshan: Cost-Based Optimization for Magic: Algebra and Implementation. SIGMOD Conference 1996: 435-446
  5. Zhe Li, Kenneth A. Ross: PERF Join: An Alternative to Two-way Semijoin and Bloomjoin. CIKM 1995: 137-144
  6. Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): 73-170(1993)
  7. Priti Mishra, Margaret H. Eich: Join Processing in Relational Databases. ACM Comput. Surv. 24(1): 63-113(1992)
  8. Arie Segev, Jooseok Park: Updating Distributed Materialized Views. IEEE Trans. Knowl. Data Eng. 1(2): 173-184(1989)
  9. Arie Segev, Himawan Gunadhi: Event-Join Optimization in Temporal Relational Databases. VLDB 1989: 205-215
  10. Alan J. Kent, Ron Sacks-Davis, Kotagiri Ramamohanarao: A Superimposed Coding Scheme Based on Multiple Block Descriptor Files for Indexing Very Large Data Bases. VLDB 1988: 351-359
  11. Ming-Chien Shan: Optimal Plan Search in a Rule-Based Query Optimizer. EDBT 1988: 92-112
  12. Ron Sacks-Davis, Alan J. Kent, Kotagiri Ramamohanarao: Multikey Access Methods Based on Superimposed Coding Techniques. ACM Trans. Database Syst. 12(4): 655-696(1987)
  13. Eric N. Hanson: A Performance Analysis of View Materialization Strategies. SIGMOD Conference 1987: 440-453
  14. Lothar F. Mackert, Guy M. Lohman: R* Optimizer Validation and Performance Evaluation for Distributed Queries. VLDB 1986: 149-159
  15. Lothar F. Mackert, Guy M. Lohman: R* Optimizer Validation and Performance Evaluation for Local Queries. SIGMOD Conference 1986: 84-95
  16. Giovanni Maria Sacco: Distributed Query Evaluation in Local Area Networks. ICDE 1984: 510-516
  17. Gaston H. Gonnet: Unstructured Data Bases or Very Efficient Text Searching. PODS 1983: 117-124
  18. Houtan Aghili, Dennis G. Severance: A Practical Guide to the Design of Differential Files for Recovery of On-Line Databases. ACM Trans. Database Syst. 7(4): 540-565(1982)
  19. Dennis G. Severance, Guy M. Lohman: Differential Files: Their Application to the Maintenance of Large Databases. ACM Trans. Database Syst. 1(3): 256-267(1976)
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
CACM, ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:51:45 2009