Mining Deviants in a Time Series Database
|
H. V. Jagadish,
Nick Koudas, and
S. Muthukrishnan
View Paper (PDF)
Return to Changes and Temporal Data
Identifiying outliers is an important data analysis function. Statisticans have long studied techniques to identify outliers is a data set in the context of fitting the data to some model. In the case of time series data, the situation is more murky. For instance, the ``typical'' value cound ``drift'' up or down over time, so the extrema may not necessarily be interesting. We wish to identify data points that are somehow anomalous or ``surprising''.
We formally define the notion of a deviant in a time series, based on a representation sparsity metric. We develop an efficient algorithm to identify devinats is a time series. We demonstrate how this technique can be used to locate interesting artifacts in time series data, and present experimental evidence of the value of our technique.
As a side benefit, our algorithm are able to produce histogram representations of data, that have substantially lower error than ``optimal histograms'' for the same total storage, including both histogram buckets and the deviants stored separately. This is of independent interest for selectivity estimation.
Note: References link to DBLP on the Web.
-
[AAR95]
-
Andreas Arning
,
Rakesh Agrawal
,
Prabhakar Raghavan
: A Linear Method for Deviation Detection in Large Databases.
KDD 1996
: 164-169
-
[Bel54]
-
...
-
[Cha84]
-
...
-
[GMP97]
-
Phillip B. Gibbons
,
Yossi Matias
,
Viswanath Poosala
: Fast Incremental Maintenance of Approximate Histograms.
VLDB 1997
: 466-475
-
[HDY99]
-
Jiawei Han
,
Guozhu Dong
,
Yiwen Yin
: Efficient Mining of Partial Periodic Patterns in Time Series Database.
ICDE 1999
: 106-115
-
[Ioa93]
-
Yannis E. Ioannidis
: Universality of Serial Histograms.
VLDB 1993
: 256-267
-
[IP95]
-
Yannis E. Ioannidis
,
Viswanath Poosala
: Balancing Histogram Optimality and Practicality for Query Result Size Estimation.
SIGMOD Conference 1995
: 233-244
-
[JKM+98]
-
H. V. Jagadish
,
Nick Koudas
,
S. Muthukrishnan
,
Viswanath Poosala
,
Kenneth C. Sevcik
,
Torsten Suel
: Optimal Histograms with Quality Guarantees.
VLDB 1998
: 275-286
-
[KN98]
-
Edwin M. Knorr
,
Raymond T. Ng
: Algorithms for Mining Distance-Based Outliers in Large Datasets.
VLDB 1998
: 392-403
-
[PI97]
-
Viswanath Poosala
,
Yannis E. Ioannidis
: Selectivity Estimation Without the Attribute Value Independence Assumption.
VLDB 1997
: 486-495
-
[PIHS96]
-
Viswanath Poosala
,
Yannis E. Ioannidis
,
Peter J. Haas
,
Eugene J. Shekita
: Improved Histograms for Selectivity Estimation of Range Predicates.
SIGMOD Conf. 1996
: 294-305
@inproceedings{DBLP:conf/vldb/KoudasMJ99,
author = {H. V. Jagadish and
Nick Koudas and
S. Muthukrishnan},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Mining Deviants in a Time Series Database},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-5},
pages = {102-113},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
Copyright(C) 2000 ACM
|