Parallel Mining Algorithms for Generalized Association Rules with Classification Hierarchy
Takahiko Shintani (Institute of Industrial Science, The University of Tokyo)
Masaru Kitsuregawa (Institute of Industrial Science, The University of Tokyo)
Association rule mining recently attracted strong attention. Usually,
the classification hierarchy over the data items is available. Users
are interested in generalized association rules that span different
levels of the hierarchy, since sometimes more interesting rules can be
derived by taking the hierarchy into account.
In this paper, we propose the new parallel algorithms for mining
association rules with classification hierarchy on a shared-nothing
parallel machine to improve its performance. Our algorithms partition
the candidate itemsets over the processors, which exploits the
aggregate memory of the system effectively. If the candidate itemsets
are partitioned without considering classification hierarchy, both the
items and its all the ancestor items have to be transmitted, that
causes prohibitively large amount of communications. Our method
minimizes interprocessor communication by considering the hierarchy.
Moreover, in our algorithm, the available memory space is fully
utilized by identifying the frequently occurring candidate itemsets
and copying them over all the processors, through which frequent
itemsets can be processed locally without any communication. Thus it
can effectively reduce the load skew among the processors. Several
experiments are done by changing the granule of copying itemsets, from
the whole tree, to the small group of the frequent itemsets along the
hierarchy. The coarser the grain, the easier the control but it is
rather difficult to achieve the sufficient load balance. The finer the
grain, the more complicated the control is required but it can balance
the load quite well.
We implemented proposed algorithms on IBM SP-2. Performance
evaluations show that our algorithms are effective for handling skew
and attain sufficient speedup ratio.