Efficient Construction of Regression Trees with Range and Region Splitting.

Yasuhiko Morimoto, Hiromu Ishii, Shinichi Morishita: Efficient Construction of Regression Trees with Range and Region Splitting. VLDB 1997: 166-175
We propose an efficient way of constructing regression trees in order to predict the objective numeric attribute values of given tuples. A regression tree is a rooted binary tree such that each internal node contains a test, which can be expressed as an RDB query, for splitting tuples into two disjoint classes and passing data in each class down to the left or right subtree. The mean of the objective attribute values at the leaf is used as the predicted value of the tuple.

To test a numeric attribute, traditional approaches use a guillotine-cut splitting that classifies data into those below a given value and others. Instead, we consider a family R of grid-regions in the plane associated with two given numeric attributes. We propose to use a test that splits data into those that lie inside a region R and those that lie outside.

The contributions of this paper are as follows. We present an efficient algorithm for computing R in R that minimizes the mean squared error after the introduction of the test with the region R. Experiments confirmed that the use of region splitting gives a smaller mean squared error of regression trees. Our approach can also generate smaller regression trees.

Copyright © 1997 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

