ACM SIGMOD Anthology VLDB dblp.uni-trier.de

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning.

Rajeev Rastogi, Kyuseok Shim: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. VLDB 1998: 404-415
@inproceedings{DBLP:conf/vldb/RastogiS98,
  author    = {Rajeev Rastogi and
               Kyuseok Shim},
  editor    = {Ashish Gupta and
               Oded Shmueli and
               Jennifer Widom},
  title     = {PUBLIC: A Decision Tree Classifier that Integrates Building and
               Pruning},
  booktitle = {VLDB'98, Proceedings of 24rd International Conference on Very
               Large Data Bases, August 24-27, 1998, New York City, New York,
               USA},
  publisher = {Morgan Kaufmann},
  year      = {1998},
  isbn      = {1-55860-566-5},
  pages     = {404-415},
  ee        = {db/conf/vldb/RastogiS98.html},
  crossref  = {DBLP:conf/vldb/98},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used toclassify subsequent records. A number of popular classifiers construct decision trees to generate classmodels. These classifiers first build a decision tree and then prune sub- trees from the decision tree in a subsequent pruning phase to improve accuracy and prevent "overfitting".

In this paper, we propose PUBLIC, an improved decision tree classifier that integrates the second "pruning" phase with the initial "building" phase. In PUBLIC, a node is not expanded during the building phase, if it is determined that it will be pruned during the subsequent pruning phase. In order to make this determination for a node, before it is expanded, PUBLIC computes a lower bound on the minimum cost subtree rooted at the node. This estimate is then used by PUBLIC to identify the nodes that are certain to be pruned, and for such nodes, not expend effort on splitting them. Experimental results with real-life as well as synthetic data sets demonstrate the effectiveness of PUBLIC's integrated approach which has the ability to deliver substantial performance improvements.

Copyright © 1998 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 1 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Ashish Gupta, Oded Shmueli, Jennifer Widom (Eds.): VLDB'98, Proceedings of 24rd International Conference on Very Large Data Bases, August 24-27, 1998, New York City, New York, USA. Morgan Kaufmann 1998, ISBN 1-55860-566-5
Contents BibTeX

References

[AGI+92]
Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami: An Interval Classifier for Database Mining Applications. VLDB 1992: 560-573 BibTeX
[AIS93]
Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami: Database Mining: A Performance Perspective. IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) BibTeX
[BFOS84]
Leo Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
BibTeX
[CKS+88]
...
[Fay91]
...
[FI93]
...
[FMM96]
Takeshi Fukuda, Yasuhiko Morimoto, Shinichi Morishita, Takeshi Tokuyama: Constructing Efficient Decision Trees by Using Optimized Numeric Association Rules. VLDB 1996: 146-155 BibTeX
[Gol89]
David E. Goldberg: Genetic Algorithms in Search Optimization and Machine Learning. Addison-Wesley 1989, ISBN 0-201-15767-5
BibTeX
[GRG98]
Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti: RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. VLDB 1998: 416-427 BibTeX
[HMS66]
...
[KT81]
...
[MAR96]
Manish Mehta, Rakesh Agrawal, Jorma Rissanen: SLIQ: A Fast Scalable Classifier for Data Mining. EDBT 1996: 18-32 BibTeX
[MRA95]
Manish Mehta, Jorma Rissanen, Rakesh Agrawal: MDL-Based Decision Tree Pruning. KDD 1995: 216-221 BibTeX
[MST94]
...
[QR89]
...
[Qui86]
J. Ross Quinlan: Induction of Decision Trees. Machine Learning 1(1): 81-106(1986) BibTeX
[Qui87]
...
[Qui93]
J. Ross Quinlan: C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
[Rip96]
...
[Ris78]
...
[Ris89]
...
[RS98]
...
[SAM96]
John C. Shafer, Rakesh Agrawal, Manish Mehta: SPRINT: A Scalable Parallel Classifier for Data Mining. VLDB 1996: 544-555 BibTeX

Referenced by

  1. Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim: Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Conference 2000: 427-438
  2. Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim: Data Mining and the Web: Past, Present and Future. Workshop on Web Information and Data Management 1999: 43-47
  3. Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-Yin Loh: BOAT-Optimistic Decision Tree Construction. SIGMOD Conference 1999: 169-180
  4. Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan: A Framework for Measuring Changes in Data Characteristics. PODS 1999: 126-137
  5. Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti: RainForest - A Framework for Fast Decision Tree Construction of Large Datasets. VLDB 1998: 416-427
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:22 2009