Free Parallel Data Mining
Bin Li (New York University)
Dennis Shasha (New York University)
Data mining is computationally expensive. Since the benefits of data mining
results are unpredictable, organizations
may not be willing to buy new hardware for that purpose.
We will present a system that enables data mining applications
to run in parallel on networks of workstations in a fault-tolerant manner.
We will describe our parallelization of a combinatorial pattern discovery
algorithm and a classification tree algorithm.
We will demonstrate the effectiveness of our system
with two real applications: discovering active motifs in protein sequences and
predicting foreign exchange rate movement.
Home pages of Bin Li
and Dennis
Shasha. Home page of our
software.