DiSC - Data Organization and Access for Efficient Data Mining

Digital Symposium Collection 2000

Data Organization and Access for Efficient Data Mining

B. Dunkel and N. Soparkar
View Paper (PDF)

Return to Session 15: Data Clustering

Abstract

Efficient mining of data presents a significant challenge due to problems of combinatorial explosion in the space and time often required for such processing. While previous work has focused on improving the efficiency of the mining algorithms, we consider how the representation, organization, and access of the data may significantly affect performance, especially when I/O costs are also considered. By a simple analysis and comparison of the counting stage for the Apriori association rules algorithm, we show that a `column-wise' approach to data access is often more efficient than the standard row-wise approach. We also provide the results of empirical simulations to validate our analysis. The key idea in our approach is that counting in the Apriori algorithm with data accessed in a column-wise manner significantly reduces the number of disk accesses required to identify itemsets with a minimum support in the database -- primarily by reducing the degree to which data and counters need to be repeatedly brought into memory.

Copyright(C) 2000 ACM