@inproceedings{DBLP:conf/sigmod/LiS98, author = {Bin Li and Dennis Shasha}, editor = {Laura M. Haas and Ashutosh Tiwary}, title = {Free Parallel Data Mining}, booktitle = {SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, June 2-4, 1998, Seattle, Washington, USA}, publisher = {ACM Press}, year = {1998}, isbn = {0-89791-995-5}, pages = {541-543}, ee = {http://doi.acm.org/10.1145/276304.276374, db/conf/sigmod/LiS98.html}, crossref = {DBLP:conf/sigmod/98}, bibsource = {DBLP, http://dblp.uni-trier.de} }BibTeX
Data mining is computationally expensive. Since the benefits of data mining results are unpredictable, organizations may not be willing to buy new hardware for that purpose. We will present a system that enables data mining applications to run in parallel on networks of workstations in a fault-tolerant manner. We will describe our parallelization of a combinatorial pattern discovery algorithm and a classification tree algorithm. We will demonstrate the effectiveness of our system with two real applications: discovering active motifs in protein sequences and predicting foreign exchange rate movement.
Home pages of Bin Li and Dennis Shasha. Home page of our software.
Copyright © 1998 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.