Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Multi-dimensional Selectivity Estimation Using Compressed Histogram Information

Ju-Hong Lee, Deok-Hwan Kim, and Chin-Wan Chung

  View Paper (PDF)  

Return to Histograms

Abstract
The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates. In this paper, we propose a novel approach for the multi-dimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This enables low error rates and low storage overheads even in high dimensions. In addition, this approach has the advantage of supporting dynamic data updates by eliminating the overhead for periodical reconstructions of the compressed information. Extensive experimental results show advantages of the proposed approach.


References

Note: References link to DBLP on the Web.

[AFS93]
...
[BBK98]
Stefan Berchtold , Christian Böhm , Hans-Peter Kriegel : The Pyramid-Tree: Breaking the Curse of Dimensionality. SIGMOD Conference 1998 : 142-153
[BKK96]
Stefan Berchtold , Daniel A. Keim , Hans-Peter Kriegel : The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996 : 28-39
[BF95]
Alberto Belussi , Christos Faloutsos : Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. VLDB 1995 : 299-310
[CR94]
Chungmin Melvin Chen , Nick Roussopoulos : Adaptive Selectivity Estimation Using Query Feedback. SIGMOD Conference 1994 : 161-172
[CSZS97]
...
[CG96]
Surajit Chaudhuri , Luis Gravano : Optimizing Queries over Multimedia Repositories. SIGMOD Conf. 1996 : 91-102
[Chri83]
Stavros Christodoulakis : Estimating record selectivities. IS 8(2) : 105-115(1983)
[EKSWX98]
Martin Ester , Hans-Peter Kriegel , Jörg Sander , Michael Wimmer , Xiaowei Xu : Incremental Clustering for Mining in a Data Warehousing Environment. VLDB 1998 : 323-333
[EKSX96]
Martin Ester , Hans-Peter Kriegel , Jörg Sander , Xiaowei Xu : A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. KDD 1996 : 226-231
[Fa96]
Ronald Fagin : Combining Fuzzy Information from Multiple Systems. PODS 1996 : 216-226
[Fa98]
Ronald Fagin : Fuzzy Queries in Multimedia Database Systems. PODS 1998 : 1-10
[GRS98]
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim : CURE: An Efficient Clustering Algorithm for Large Databases. SIGMOD Conference 1998 : 73-84
[HNSS95]
Peter J. Haas , Jeffrey F. Naughton , S. Seshadri , Lynne Stokes : Sampling-Based Estimation of the Number of Distinct Values of an Attribute. VLDB 1995 : 311-322
[Io93]
Yannis E. Ioannidis : Universality of Serial Histograms. VLDB 1993 : 256-267
[IP95]
Yannis E. Ioannidis , Viswanath Poosala : Balancing Histogram Optimality and Practicality for Query Result Size Estimation. SIGMOD Conference 1995 : 233-244
[JKMPSS98]
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel : Optimal Histograms with Quality Guarantees. VLDB 1998 : 275-286
[Lim90]
...
[MCS98]
Michael V. Mannino , Paicheng Chu , Thomas Sager : Statistical Profile Estimation in Database Systems. Computing Surveys 20(3) : 191-221(1988)
[Nh94]
Raymond T. Ng , Jiawei Han : Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994 : 144-155
[PSTW93]
Bernd-Uwe Pagel , Hans-Werner Six , Heinrich Toben , Peter Widmayer : Towards an Analysis of Range Query Performance in Spatial Data Structures. PODS 1993 : 214-221
[PIHS96]
Viswanath Poosala , Yannis E. Ioannidis , Peter J. Haas , Eugene J. Shekita : Improved Histograms for Selectivity Estimation of Range Predicates. SIGMOD Conf. 1996 : 294-305
[PI97]
Viswanath Poosala , Yannis E. Ioannidis : Selectivity Estimation Without the Attribute Value Independence Assumption. VLDB 1997 : 486-495
[RY90]
...
[SCZ98]
...
[SLRD93]
Wei Sun , Yibei Ling , Naphtali Rishe , Yi Deng : An Instant and Accurate Estimation Method for Joins and Selection in a Retrieval-Intensive Environment. SIGMOD Conference 1993 : 79-88
[WKW94]
Kyu-Young Whang , Sang-Wook Kim , Gio Wiederhold : Dynamic Maintenance of Data Distribution for Selectivity Estimation. VLDB Journal 3(1) : 29-51(1994)
[ZRL96]
Tian Zhang , Raghu Ramakrishnan , Miron Livny : BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD Conf. 1996 : 103-114

BIBTEX

@inproceedings{DBLP:conf/sigmod/LeeKC99,
  author    = {Ju-Hong Lee and
                Deok-Hwan Kim and
                Chin-Wan Chung},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {Multi-dimensional Selectivity Estimation Using Compressed Histogram
                Information},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {205-214},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM