|
Finding Intensional Knowledge of Distance-Based Outliers
|
Edwin M. Knorr and
Raymond T. Ng
View Paper (PDF)
Return to Data Mining & Pattern Extraction
Existing studies on outliers focus only on the
identification
aspect; none provides any
intensional knowledge
of the outliers - by which we mean a
description
or an
explanation
of why an identified outlier is exceptional. For many applications, a description or explanation is at least as vital to the user as the identification aspect. Specifically, intensional knowledge helps the user to: (i) evaluate the validity of the identified outliers, and (ii) improve one's understanding of the data.
The two main issues addresses in this paper are:
what kinds
of intensional knowledge to provide, and
how to optimize
the computation of such knowledge. With respect to the first issue, we propose finding
strongest
and
weak
outliers and their corresponding structural intensional knowledge. With respect to the second issue, we first present a naive and a semi-naive algorithm. Then, by means of what we call
path
and
semi-lattice
sharing of I/O processing, we develop two optimized approaches. We provide analytic results on their I/O performance, and present experimental results showing significant reductions in I/O and significant speedups in overall runtime.
Note: References link to DBLP on the Web.
-
[AGGR98]
-
Rakesh Agrawal
,
Johannes Gehrke
,
Dimitrios Gunopulos
,
Prabhakar Raghavan
: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
SIGMOD Conference 1998
: 94-105
-
[AGI+92]
-
Rakesh Agrawal
,
Sakti P. Ghosh
,
Tomasz Imielinski
,
Balakrishna R. Iyer
,
Arun N. Swami
: An Interval Classifier for Database Mining Applications.
VLDB 1992
: 560-573
-
[AIS93]
-
Rakesh Agrawal
,
Tomasz Imielinski
,
Arun N. Swami
: Mining Association Rules between Sets of Items in Large Databases.
SIGMOD Conference 1993
: 207-216
-
[BL94]
-
...
-
[BFOS84]
-
...
-
[BMS97]
-
Sergey Brin
,
Rajeev Motwani
,
Craig Silverstein
: Beyond Market Baskets: Generalizing Association Rules to Correlations.
SIGMOD Conference 1997
: 265-276
-
[HKPT98]
-
Ykä Huhtala
,
Juha Kärkkäinen
,
Pasi Porkka
,
Hannu Toivonen
: Efficient Discovery of Functional and Approximate Dependencies Using Partitions.
ICDE 1998
: 392-401
-
[JKN98]
-
Theodore Johnson
,
Ivy Kwok
,
Raymond T. Ng
: Fast Computation of 2-Dimensional Depth Contours.
KDD 1998
: 224-228
-
[KR90]
-
...
-
[KR98]
-
Edwin M. Knorr
,
Raymond T. Ng
: Algorithms for Mining Distance-Based Outliers in Large Datasets.
VLDB 1998
: 392-403
-
[KN99]
-
...
-
[NLHP98]
-
Raymond T. Ng
,
Laks V. S. Lakshmanan
,
Jiawei Han
,
Alex Pang
: Exploratory Mining and Pruning Optimizations of Constrained Association Rules.
SIGMOD Conference 1998
: 13-24
-
[RR96]
-
...
-
[SBMU98]
-
Craig Silverstein
,
Sergey Brin
,
Rajeev Motwani
,
Jeffrey D. Ullman
: Scalable Techniques for Mining Causal Structures.
VLDB 1998
: 594-605
-
[TN98]
-
...
-
[Tuk77]
-
...
@inproceedings{DBLP:conf/vldb/KnorrN99,
author = {Edwin M. Knorr and
Raymond T. Ng},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Finding Intensional Knowledge of Distance-Based Outliers},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-5},
pages = {211-222},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
Copyright(C) 2000 ACM
|