SPRINT: A Scalable Parallel Classifier for Data Mining.
John C. Shafer, Rakesh Agrawal, Manish Mehta:
SPRINT: A Scalable Parallel Classifier for Data Mining.
VLDB 1996: 544-555@inproceedings{DBLP:conf/vldb/ShaferAM96,
author = {John C. Shafer and
Rakesh Agrawal and
Manish Mehta 0002},
editor = {T. M. Vijayaraman and
Alejandro P. Buchmann and
C. Mohan and
Nandlal L. Sarda},
title = {SPRINT: A Scalable Parallel Classifier for Data Mining},
booktitle = {VLDB'96, Proceedings of 22th International Conference on Very
Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India},
publisher = {Morgan Kaufmann},
year = {1996},
isbn = {1-55860-382-4},
pages = {544-555},
ee = {db/conf/vldb/ShaferAM96.html},
crossref = {DBLP:conf/vldb/96},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
Classification is an important data mining problem. Although classification
is a well-studied problem, most of the current classification algorithms
are designed only for memory-resident data, thus limiting their suitability
for mining over large databases. The recently proposed SLIQ classification
algorithm addressed several issues in building a fast scalable classifier.
Unfortunately, SLIQ still requires some information to stay memory-resident.
Furthermore, this information grows in direct proportion to the number of
input records, putting a hard-limit on the size of data that can be classified.
We present for the first time a decision-tree-based classification algorithm
that removes all of the memory restrictions, and is fast and scalable.
The algorithm has also been designed to be easily parallelized. This
parallelization, also presented here, represents the first scalable
parallelization of a decision-tree classifier where all processors work
together to build a single consistent model. The combination of these
characteristics makes the proposed algorithm an ideal tool for data mining.
Copyright © 1996 by the VLDB Endowment.
Permission to copy without fee all or part of this material is granted provided that the copies are not made or
distributed for direct commercial advantage, the VLDB
copyright notice and the title of the publication and
its date appear, and notice is given that copying
is by the permission of the Very Large Data Base
Endowment. To copy otherwise, or to republish, requires
a fee and/or special permission from the Endowment.
Online Paper
CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ...
DVD Version: Load ACM SIGMOD Anthology DVD 1" and ...
BibTeX
Printed Edition
T. M. Vijayaraman, Alejandro P. Buchmann, C. Mohan, Nandlal L. Sarda (Eds.):
VLDB'96, Proceedings of 22th International Conference on Very Large Data Bases, September 3-6, 1996, Mumbai (Bombay), India.
Morgan Kaufmann 1996, ISBN 1-55860-382-4
Contents BibTeX
Electronic Edition
References
- [1]
- Rakesh Agrawal, Sakti P. Ghosh, Tomasz Imielinski, Balakrishna R. Iyer, Arun N. Swami:
An Interval Classifier for Database Mining Applications.
VLDB 1992: 560-573 BibTeX
- [2]
- Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:
Database Mining: A Performance Perspective.
IEEE Trans. Knowl. Data Eng. 5(6): 914-925(1993) BibTeX
- [3]
- ...
- [4]
- ...
- [5]
- Philip K. Chan, Salvatore J. Stolfo:
Experiments on Multi-Strategy Learning by Meta-Learning.
CIKM 1993: 314-323 BibTeX
- [6]
- ...
- [7]
- David J. DeWitt, Shahram Ghandeharizadeh, Donovan A. Schneider, Allan Bricker, Hui-I Hsiao, Rick Rasmussen:
The Gamma Database Machine Project.
IEEE Trans. Knowl. Data Eng. 2(1): 44-62(1990) BibTeX
- [8]
- David J. DeWitt, Jeffrey F. Naughton, Donovan A. Schneider:
Parallel Sorting on a Shared-Nothing Architecture using Probabilistic Splitting.
PDIS 1991: 280-291 BibTeX
- [9]
- ...
- [10]
- ...
- [11]
- David E. Goldberg:
Genetic Algorithms in Search Optimization and Machine Learning.
Addison-Wesley 1989, ISBN 0-201-15767-5
BibTeX
- [12]
- ...
- [13]
- Mike James:
Classification Algorithms.
John Wiley 1985, ISBN 0-471-84799-2
BibTeX
- [14]
- ...
- [15]
- Manish Mehta, Rakesh Agrawal, Jorma Rissanen:
SLIQ: A Fast Scalable Classifier for Data Mining.
EDBT 1996: 18-32 BibTeX
- [16]
- Donald Michie, David J. Spiegelhalter, C. C. Taylor:
Machine Learning, Neural and Statistical Classification.
Ellis Horwood 1994, ISBN 0-13-106360-X
BibTeX
- [17]
- ...
- [18]
- ...
- [19]
- J. Ross Quinlan:
Induction of Decision Trees.
Machine Learning 1(1): 81-106(1986) BibTeX
- [20]
- J. Ross Quinlan:
C4.5: Programs for Machine Learning.
Morgan Kaufmann 1993, ISBN 1-55860-238-0
BibTeX
- [21]
- ...
- [22]
- ...
- [23]
- ...
- [24]
- Sholom M. Weiss, Casimir A. Kulikowski:
Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems.
Morgan Kaufmann 1990, ISBN 1-55860-065-5
BibTeX
- [25]
- ...
Referenced by
- Haixun Wang, Carlo Zaniolo:
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems.
VLDB 2000: 166-175
- Sunil Choenni:
Design and Implementation of a Genetic-Based Algorithm for Data Mining.
VLDB 2000: 33-42
- Rakesh Agrawal, Ramakrishnan Srikant:
Privacy-Preserving Data Mining.
SIGMOD Conference 2000: 439-450
- Giovanni Giuffrida, Wesley W. Chu, Dominique M. Hanssens:
Mining Classification Rules from Datasets with Large Number of Many-Valued Attributes.
EDBT 2000: 335-349
- Rakesh Agrawal, Roberto J. Bayardo Jr., Ramakrishnan Srikant:
Athena: Mining-Based Interactive Management of Text Database.
EDBT 2000: 365-379
- Daniel Barbará, Xintao Wu:
The Role of Approximations in Maintaining and Using Aggregate Views.
IEEE Data Eng. Bull. 22(4): 15-21(1999)
- Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim:
Data Mining and the Web: Past, Present and Future.
Workshop on Web Information and Data Management 1999: 43-47
- Johannes Gehrke, Venkatesh Ganti, Raghu Ramakrishnan, Wei-Yin Loh:
BOAT-Optimistic Decision Tree Construction.
SIGMOD Conference 1999: 169-180
- Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan:
A Framework for Measuring Changes in Data Characteristics.
PODS 1999: 126-137
- Mohammed Javeed Zaki, Ching-Tien Ho, Rakesh Agrawal:
Parallel Classification for Data Mining on Shared-Memory Multiprocessors.
ICDE 1999: 198-205
- Surajit Chaudhuri, Usama M. Fayyad, Jeff Bernhardt:
Scalable Classification over SQL Databases.
ICDE 1999: 470-479
- Roberto J. Bayardo Jr., Rakesh Agrawal, Dimitrios Gunopulos:
Constraint-Based Rule Mining in Large, Dense Databases.
ICDE 1999: 188-197
- Philip S. Yu:
Data Mining and Personalization Technologies.
DASFAA 1999: 6-13
- Rajeev Rastogi, Kyuseok Shim:
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning.
VLDB 1998: 404-415
- Johannes Gehrke, Raghu Ramakrishnan, Venkatesh Ganti:
RainForest - A Framework for Fast Decision Tree Construction of Large Datasets.
VLDB 1998: 416-427
- Soumen Chakrabarti, Byron Dom, Piotr Indyk:
Enhanced Hypertext Categorization Using Hyperlinks.
SIGMOD Conference 1998: 307-318
- Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan:
Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
SIGMOD Conference 1998: 94-105
- Ching-Tien Ho, Rakesh Agrawal, Nimrod Megiddo, Ramakrishnan Srikant:
Range Queries in OLAP Data Cubes.
SIGMOD Conference 1997: 73-88
- Brian Lent, Arun N. Swami, Jennifer Widom:
Clustering Association Rules.
ICDE 1997: 220-231
- Vibby Gottemukkala, Anant Jhingran, Sriram Padmanabhan:
Interfacing Parallel Applications and Parallel Databases.
ICDE 1997: 355-364
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:13 2009