Exploratory Mining and Pruning Optimizations of Constrained Associations Rules
Raymond T. Ng (University of British Columbia),
Laks V. S. Lakshmanan (Concordia University),
Jiawei Han (Simon Fraser University),
Alex Pang (University of British Columbia)
From the standpoint of supporting human-centered discovery of knowledge,
the present-day model of mining association rules suffers from the
following serious shortcomings: (i) lack of user exploration and control,
(ii) lack of focus, and (iii) rigid notion of relationships. In effect,
this model functions as a black-box, admitting little user interaction
in between. We propose, in this paper, an architecture that opens up
the black-box, and supports constraint-based, human-centered exploratory
mining of associations. The foundation of this architecture is a rich set
of constraint constructs, including domain, class, and sql-style aggregate
constraints, which enable users to clearly specify what associations are
to be mined. We propose ``constrained association queries'' as a means
of specifying the constraints to be satisfied by the antecedent and
consequent of a mined association.
In this paper, we mainly focus on the technical challenges in guaranteeing
a level of performance that is commensurate with the selectivities of the
constraints in an association query. To this end, we introduce and analyze
two properties of constraints that are critical to pruning:
``anti-monotonicity'' and ``succinctness''.
We then develop characterizations of various constraints into four categories,
according to these properties. Finally, we describe a mining algorithm called
CAP, which achieves a maximized degree of pruning for all categories of
constraints. Experimental results indicate that CAP can run much faster, in
some cases as much as 80 times, than several basic algorithms. This
demonstrates how important the succinctness and anti-monotonicity properties
are, in delivering the performance guarantee.
For more information, please see here.