LH* - A Scalable, Distributed Data Structure.

Witold Litwin, Marie-Anne Neimat, Donovan A. Schneider: LH* - A Scalable, Distributed Data Structure. ACM Trans. Database Syst. 21(4): 480-525(1996)
  author    = {Witold Litwin and
               Marie-Anne Neimat and
               Donovan A. Schneider},
  title     = {LH* - A Scalable, Distributed Data Structure},
  journal   = {ACM Trans. Database Syst.},
  volume    = {21},
  number    = {4},
  year      = {1996},
  pages     = {480-525},
  ee        = {, db/journals/tods/LitwinN96.html},
  bibsource = {DBLP,}


We present a scalable distributed data structure called LH*. LH* generalizes Linear Hashing (LH) to distributed RAM and disk files. An LH* file can be created from records with primary keys, or objects with OIDs, provided by any number of distributed and autonomous clients. It does not require a central directory, and grows gracefully, through splits of one bucket at a time, to virtually any number of servers. The number of messages per random insertion is one in general, and three in the worst case, regardless of the file size. The number of messages per key search is two in general, and four in the worst case. The file supports parallel operations, e.g., hash joins and scans. Performing a parallel operation on a file of M buckets costs at most 2M + 1 messages, and between 1 and O(log2 Mrounds of messages.

We first describle the basic LH* scheme where a coordinator site manages abucket splits, and splits a bucket every time a collision occurs. We show that the average load factor of an LH* file is 65%-70% regardless of file size, and bucket capacity. We then enhance the scheme with load control, performed at no additional message cost. The average load factor then increases to 80-95%. These values are about that of LH, but the load factor for LH* varies more.

We nest define LH* schemes without a coordinator. We show that insert and search costs are the same as for the basic scheme. The splitting cost decreases on the average, but becomes more variable, as cascading splits are needed to prevent file overload. Next, we briefly describe two variants of splitting policy, using parallel splits and presplitting that should enhance performance for high-performance applications.

All together, we show that LH* files can efficiently scale to files that are orders of magnitude larger in size than single-site files. LH* files that reside in main memory may also be much faster than single-site disk files. Finally, LH* files can be more efficient than any distributed file with a centralized directory, or a static parallel or distributed hash file.

Copyright © 1996 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 4 Issue 1, Books, VLDB-j, TODS, ..." and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

Online Edition: ACM Digital Library

[Abstract, Index Terms and Review]
[Full Text in PDF Format, 762 KB]


[Abeysundara and Kamal 1991]
Bandula W. Abeysundara, Ahmed E. Kamal: High-Speed Local Area Networks and Their Performance: A Survey. ACM Comput. Surv. 23(2): 221-264(1991) BibTeX
[Amin et al. 1994]
Minesh B. Amin, Donovan A. Schneider, V. Singh: An Adaptive, Load Balancing Parallel Join Algorithm. COMAD 1994: 0- BibTeX
[Devine 1993]
Robert Devine: Design and Implementation of DDH: A Distributed Dynamic Hashing Algorithm. FODO 1993: 101-114 BibTeX
[DeWitt and Gray 1992]
David J. DeWitt, Jim Gray: Parallel Database Systems: The Future of High Performance Database Systems. Commun. ACM 35(6): 85-98(1992) BibTeX
[DeWitt et al. 1986]
David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 228-237 BibTeX
[Enbody and Du 1988]
Richard J. Enbody, H. C. Du: Dynamic Hashing Schemes. ACM Comput. Surv. 20(2): 85-113(1988) BibTeX
[Fagin et al. 1979]
Ronald Fagin, Jürg Nievergelt, Nicholas Pippenger, H. Raymond Strong: Extendible Hashing - A Fast Access Method for Dynamic Files. ACM Trans. Database Syst. 4(3): 315-344(1979) BibTeX
[Gallant 1992]
[Knuth 1973]
Donald E. Knuth: The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd Edition. Addison-Wesley 1973
[Kitsuregawa et al. 1984]
[Kröll and Widmayer 1994]
Brigitte Kröll, Peter Widmayer: Distributing a Search Tree Among a Growing Number of Processors. SIGMOD Conference 1994: 265-276 BibTeX
[Larson 1978]
Per-Åke Larson: Dynamic Hashing. BIT 18(2): 184-201(1978) BibTeX
[Larson 1980]
Per-Åke Larson: Linear Hashing with Partial Expansions. VLDB 1980: 224-232 BibTeX
[Larson 1988]
Per-Åke Larson: Dynamic Hash Tables. Commun. ACM 31(4): 446-457(1988) BibTeX
[Litwin 1980]
Witold Litwin: Linear Hashing: A New Tool for File and Table Addressing. VLDB 1980: 212-223 BibTeX
[Litwin et al. 1993]
Witold Litwin, Marie-Anne Neimat, Donovan A. Schneider: LH* - Linear Hashing for Distributed Files. SIGMOD Conference 1993: 327-336 BibTeX
[Litwin et al. 1994]
Witold Litwin, Marie-Anne Neimat, Donovan A. Schneider: RP*: A Family of Order Preserving Scalable Distributed Data Structures. VLDB 1994: 342-353 BibTeX
[Levy and Silberschatz 1990]
Eliezer Levy, Abraham Silberschatz: Distributed File Systems: Concepts and Examples. ACM Comput. Surv. 22(4): 321-374(1990) BibTeX
[Nance 1992]
[Ramamohanarao and Sacks-Davis 1984]
Kotagiri Ramamohanarao, Ron Sacks-Davis: Recursive Linear Hashing. ACM Trans. Database Syst. 9(3): 369-391(1984) BibTeX
[Salzberg 1988]
Betty Salzberg: File Structures: An Analytic Approach. Prentice-Hall 1988, ISBN 0-13-314550-6
[Samet 1989]
Hanan Samet: The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
[Schwetman 1990]
[Severance et al. 1990]
Charles Severance, Sakti Pramanik, P. Wolberg: Distributed Linear Hashing and Parallel Projection in Main Memory Databases. VLDB 1990: 674-682 BibTeX
[Stonebraker 1986]
Michael Stonebraker: The Case for Shared Nothing. IEEE Database Eng. Bull. 9(1): 4-9(1986) BibTeX
[Tanenbaum 1995]
[Teradata Corp. 1988]
[Vaskevitch 1994]
David Vaskevitch: Database in Crisis and Transition: A Technical Agenda for the Year 2001. SIGMOD Conference 1994: 484-489 BibTeX
[Vingralek et al. 1994]
Radek Vingralek, Yuri Breitbart, Gerhard Weikum: Distributed File Organization with Scalable Cost/Performance. SIGMOD Conference 1994: 253-264 BibTeX

Referenced by

  1. Witold Litwin, Thomas J. E. Schwarz: LH*RS: A High-Availability Scalable Distributed Data Structure using Reed Solomon Codes. SIGMOD Conference 2000: 237-248
  2. Jonas S. Karlsson, Witold Litwin, Tore Risch: LH*LH: A scalable High Performance Data Structure for Switched Multicomputers. EDBT 1996: 573-591
  3. Witold Litwin, Marie-Anne Neimat, Donovan A. Schneider: RP*: A Family of Order Preserving Scalable Distributed Data Structures. VLDB 1994: 342-353
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Tue Jun 24 18:39:20 2008