ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules.

Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155
@inproceedings{DBLP:conf/vldb/FukudaMMT96,
  author    = {Takeshi Fukuda and
               Yasuhiko Morimoto and
               Shinichi Morishita and
               Takeshi Tokuyama},
  editor    = {T. M. Vijayaraman and
               Alejandro P. Buchmann and
               C. Mohan and
               Nandlal L. Sarda},
  title     = {Constructing Efficient Decision Trees by Using Optimized Numeric
               Association Rules},
  booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
               Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
  publisher = {Morgan Kaufmann},
  year      = {1996},
  isbn      = {1-55860-382-4},
  pages     = {146-155},
  ee        = {db/conf/vldb/FukudaMMT96.html},
  crossref  = {DBLP:conf/vldb/96},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

We propose an extension of an entropy-based heuristic of Quinlan [Q93] for constructing a decision tree from a large database with many numeric attributes. Quinlan pointed out that his original method (as well as other existing methods) may be inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pair of attributes. For R in R, the data can be split into two classes: data inside R and data outside R.

We compute the region Ropt in R that minimizes the entropy of the splitting, and add the splitting associated with Ropt (for each pair of strongly correlated attributes) to the set of candidate tests in Quinlan's entropy-based heuristic.

We give efficient algorithms for cases in which R is (1) x-monotone connected regions, (2) based-monotone regions, (3) rectangles, and (4) rectilinear convex regions. The algorithm for the first case has been implemented as a subsystem of SONAR(System for Optimized Numeric Association Rules) developed by the authors. Tests show that our approach can create small-sized decision trees.

Copyright © 1996 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.): VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India. Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX

Electronic Edition

References

[ACKT96]
Tetsuo Asano, Danny Z. Chen, Naoki Katoh, Takeshi Tokuyama: Polynomial-Time Solutions to Image Segmentation. SODA 1996: 104-113 BibTeX
[AGL+92]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 BibTeX
[ALS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) BibTeX
[AT94]
...
[BFOS84]
...
[CG86]
Bernard Chazelle, Leonidas J. Guibas: Fractional Cascading: I. A Data Structuring Technique. Algorithmica 1(2): 133-162(1986) BibTeX
[DE93]
David P. Dobkin, David Eppstein: Computing the Discrepancy. Symposium on Computational Geometry 1993: 47-52 BibTeX
[DEY86]
David P. Dobkin, Herbert Edelsbrunner, Chee-Keng Yap: Probing Convex Polytopes. STOC 1986: 424-432 BibTeX
[FMMT96a]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191 BibTeX
[FMMT96b]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23 BibTeX
[FMMY96c]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: SONAR: System for Optimized Numeric AssociationRules. SIGMOD Conference 1996: 553 BibTeX
[GJ79]
M. R. Garey, David S. Johnson: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman 1979, ISBN 0-7167-1044-7
BibTeX
[HR76]
Laurent Hyafil, Ronald L. Rivest: Constructing Optimal Binary Decision Trees is NP-Complete. Inf. Process. Lett. 5(1): 15-17(1976) BibTeX
[MAR96]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 BibTeX
[MST84]
...
[PR90]
...
[Q86]
J. Ross Quinlan: Induction of Decision Trees. Machine Learning 1(1): 81-106(1986) BibTeX
[Q93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
[QR89]
J. Ross Quinlan, Ronald L. Rivest: Inferring Decision Trees Using the Minimum Description Length Principle. Inf. Comput. 80(3): 227-248(1989) BibTeX

Referenced by

  1. Shinichi Morishita, Jun Sese: Traversing Itemset Lattice with Statistical Metric Pruning. PODS 2000: 226-236
  2. Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-Yin Loh: BOAT-Optimistic Decision Tree Construction. SIGMOD Conference 1999: 169-180
  3. Rajeev Rastogi, Kyuseok Shim: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. VLDB 1998: 404-415
  4. Yasuhiko Morimoto, Takeshi Fukuda, Hirofumi Matsuzawa, Takeshi Tokuyama, Kunikazu Yoda: Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases. VLDB 1998: 380-391
  5. Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti: RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. VLDB 1998: 416-427
  6. Takeshi Fukuda, Hirofumi Matsuzawa: Parallel Processing of Multiple Aggregate Queries on Shared-Nothing Multiprocessors. EDBT 1998: 278-292
  7. Yasuhiko Morimoto, Hiromu Ishii, Shinichi Morishita: Efficient Construction of Regression Trees with Range and Region Splitting. VLDB 1997: 166-175
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:10 2009