ACM SIGMOD Anthology VLDB dblp.uni-trier.de

Efficient Construction of Regression Trees with Range and Region Splitting.

Yasuhiko Morimoto, Hiromu Ishii, Shinichi Morishita: Efficient Construction of Regression Trees with Range and Region Splitting. VLDB 1997: 166-175
@inproceedings{DBLP:conf/vldb/MorimotoIM97,
  author    = {Yasuhiko Morimoto and
               Hiromu Ishii and
               Shinichi Morishita},
  editor    = {Matthias Jarke and
               Michael J. Carey and
               Klaus R. Dittrich and
               Frederick H. Lochovsky and
               Pericles Loucopoulos and
               Manfred A. Jeusfeld},
  title     = {Efficient Construction of Regression Trees with Range and Region
               Splitting},
  booktitle = {VLDB'97, Proceedings of 23rd International Conference on Very
               Large Data Bases, August 25-29, 1997, Athens, Greece},
  publisher = {Morgan Kaufmann},
  year      = {1997},
  isbn      = {1-55860-470-7},
  pages     = {166-175},
  ee        = {db/conf/vldb/MorimotoIM97.html},
  crossref  = {DBLP:conf/vldb/97},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

We propose an efficient way of constructing regression trees in order to predict the objective numeric attribute values of given tuples. A regression tree is a rooted binary tree such that each internal node contains a test, which can be expressed as an RDB query, for splitting tuples into two disjoint classes and passing data in each class down to the left or right subtree. The mean of the objective attribute values at the leaf is used as the predicted value of the tuple.

To test a numeric attribute, traditional approaches use a guillotine-cut splitting that classifies data into those below a given value and others. Instead, we consider a family R of grid-regions in the plane associated with two given numeric attributes. We propose to use a test that splits data into those that lie inside a region R and those that lie outside.

The contributions of this paper are as follows. We present an efficient algorithm for computing R in R that minimizes the mean squared error after the introduction of the test with the region R. Experiments confirmed that the use of region splitting gives a smaller mean squared error of regression trees. Our approach can also generate smaller regression trees.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Matthias Jarke, Michael J. Carey, Klaus R. Dittrich, Frederick H. Lochovsky, Pericles Loucopoulos, Manfred A. Jeusfeld (Eds.): VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases, August 25-29, 1997, Athens, Greece. Morgan Kaufmann 1997, ISBN 1-55860-470-7
Contents BibTeX

Electronic Edition

From CS Dept., University Trier (Germany)

References

[ACKT96]
Tetsuo Asano, Danny Z. Chen, Naoki Katoh, Takeshi Tokuyama: Polynomial-Time Solutions to Image Segmentation. SODA 1996: 104-113 BibTeX
[AIS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993: 207-216 BibTeX
[AS94]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
BibTeX
[FMMT96a]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Mining Optimized Association Rules for Numeric Attributes. PODS 1996: 182-191 BibTeX
[FMMT96b]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Data Mining Using Two-Dimensional Optimized Accociation Rules: Scheme, Algorithms, and Visualization. SIGMOD Conference 1996: 13-23 BibTeX
[FMMT96c]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155 BibTeX
[HF95]
Jiawei Han, Yongjian Fu: Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995: 420-431 BibTeX
[MAR96]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 BibTeX
[PCY95]
Jong Soo Park, Ming-Syan Chen, Philip S. Yu: An Effective Hash Based Algorithm for Mining Association Rules. SIGMOD Conference 1995: 175-186 BibTeX
[PS91]
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents BibTeX
[PSF91]
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases 1991: 229-248 BibTeX
[Qui86]
J. Ross Quinlan: Induction of Decision Trees. Machine Learning 1(1): 81-106(1986) BibTeX
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
[SA96]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Quantitative Association Rules in Large Relational Tables. SIGMOD Conference 1996: 1-12 BibTeX
[YFM+97]
Kunikazu Yoda, Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Computing Optimized Rectilinear Regions for Association Rules. KDD 1997: 96-103 BibTeX

Referenced by

  1. Shinichi Morishita, Jun Sese: Traversing Itemset Lattice with Statistical Metric Pruning. PODS 2000: 226-236
  2. Takeshi Fukuda, Hirofumi Matsuzawa: Parallel Processing of Multiple Aggregate Queries on Shared-Nothing Multiprocessors. EDBT 1998: 278-292
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:15 2009