Multi-Dimensional Substring Selectivity Estimation.
H. V. Jagadish, Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava:
Multi-Dimensional Substring Selectivity Estimation.
VLDB 1999: 387-398@inproceedings{DBLP:conf/vldb/JagadishKNS99,
author = {H. V. Jagadish and
Olga Kapitskaia and
Raymond T. Ng and
Divesh Srivastava},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Multi-Dimensional Substring Selectivity Estimation},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-7},
pages = {387-398},
ee = {db/conf/vldb/JagadishKNS99.html},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
With the explosion of the Internet, LDAP directories and XML,
there is an ever greater need to evaluate queries involving
(sub)string matching. In many cases, matches need to be on
multiple attributes/dimensions, with correlations between
dimensions. Effective query optimization in this context
requires good selectivity estimates.
In this paper, we use multi-dimensional count-suffix trees as
the basic framework for substring selectivity estimation.
Given the enormous size of these trees for large databases,
we develop a space and time efficient probabilistic algorithm
to construct multi-dimensional pruned count-suffix trees
directly. We then present two techniques to obtain good
estimates for a given multi-dimensional substring matching query,
using a pruned count-suffix tree. The first one, called GNO
(for Greedy Non-Overlap), generalizes the greedy parsing suggested
by Krishnan et al. [9] for one-dimensional substring selectivity
estimation. The second one, called MO (for Maximal Overlap),
uses all maximal multi-dimensional substrings of the query for
estimation; these multi-dimensional substrings help to capture the
correlation that may exists between strings in the multiple
dimensions. We demonstrate experimentally, using real data sets,
the MO is substantially superior to GNO in the quality
of the estimate.
Copyright © 1999 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.):
VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK.
Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents BibTeX
References
- [1]
- Raffaele Giancarlo:
A Generalization of the Suffix Tree to Square Matrices, with Applications.
SIAM J. Comput. 24(3): 520-562(1995) BibTeX
- [2]
- Raffaele Giancarlo, Roberto Grossi:
On the Construction of Classes of Suffix Trees for Square Matrices: Algorithms and Applications.
Inf. Comput. 130(2): 151-182(1996) BibTeX
- [3]
- Phillip B. Gibbons, Yossi Matias:
New Sampling-Based Summary Statistics for Improving Approximate Query Answers.
SIGMOD Conference 1998: 331-342 BibTeX
- [4]
- ...
- [5]
- Yannis E. Ioannidis:
Universality of Serial Histograms.
VLDB 1993: 256-267 BibTeX
- [6]
- Yannis E. Ioannidis, Viswanath Poosala:
Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995: 233-244 BibTeX
- [7]
- H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Suel:
Optimal Histograms with Quality Guarantees.
VLDB 1998: 275-286 BibTeX
- [8]
- H. V. Jagadish, Raymond T. Ng, Divesh Srivastava:
Substring Selectivity Estimation.
PODS 1999: 249-260 BibTeX
- [9]
- P. Krishnan, Jeffrey Scott Vitter, Balakrishna R. Iyer:
Estimating Alphanumeric Selectivity in the Presence of Wildcards.
SIGMOD Conference 1996: 282-293 BibTeX
- [10]
- Richard J. Lipton, Jeffrey F. Naughton:
Query Size Estimation by Adaptive Sampling.
PODS 1990: 40-46 BibTeX
- [11]
- Edward M. McCreight:
A Space-Economical Suffix Tree Construction Algorithm.
J. ACM 23(2): 262-272(1976) BibTeX
- [12]
- M. Muralikrishna, David J. DeWitt:
Equi-Depth Histograms For Estimating Selectivity Factors For Multi-Dimensional Queries.
SIGMOD Conference 1988: 28-36 BibTeX
- [13]
- Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita:
Improved Histograms for Selectivity Estimation of Range Predicates.
SIGMOD Conference 1996: 294-305 BibTeX
- [14]
- Viswanath Poosala, Yannis E. Ioannidis:
Selectivity Estimation Without the Attribute Value Independence Assumption.
VLDB 1997: 486-495 BibTeX
- [15]
- Patricia G. Selinger, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, Thomas G. Price:
Access Path Selection in a Relational Database Management System.
SIGMOD Conference 1979: 23-34 BibTeX
- [16]
- ...
- [17]
- Min Wang, Jeffrey Scott Vitter, Balakrishna R. Iyer:
Selectivity Estimation in the Presence of Alphanumeric Correlations.
ICDE 1997: 169-180 BibTeX
- [18]
- Peter Weiner:
Linear Pattern Matching Algorithms.
FOCS 1973: 1-11 BibTeX
Referenced by
- H. V. Jagadish, Nick Koudas, Divesh Srivastava:
On Effective Multi-Dimensional Indexing for Strings.
SIGMOD Conference 2000: 403-414
- Zhiyuan Chen, Flip Korn, Nick Koudas, S. Muthukrishnan:
Selectivity Estimation for Boolean Queries.
PODS 2000: 216-225
- Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava:
Evolution and Revolutions in LDAP Directory Caches.
EDBT 2000: 202-216
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:28 2009