|



















|
|
 |
|
 |
A Framework for Measuring Changes in Data Characteristics
|
Venkatesh Ganti,
Johannes Gehrke, and
Raghu Ramakrishnan
View Paper (PDF)
Return to Database Theory Sampler
A data mining algorithm builds a model that captures interesting aspects of the underlying data. We develop a framework for quantifying the difference, called the deviation, between two datasets in terms of the models they induce. Our framework covers a wide variety of models including frequent itemsets, decision tree classifiers, and clusters, and captures standard measures of deviation such as the misclassification rate and the chi-squared metric as special cases. We also show how statistical techniques can be applied to the deviation measure to assess whether the difference between two models is meaningful (i.e., whether the underlying datasets have statistically significant differences in their characteristics), and discuss several practical applications.
Note: References link to DBLP on the Web.
-
[1]
-
Rakesh Agrawal
,
Johannes Gehrke
,
Dimitrios Gunopulos
,
Prabhakar Raghavan
: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.
SIGMOD Conference 1998
: 94-105
-
[2]
-
Rakesh Agrawal
,
Tomasz Imielinski
,
Arun N. Swami
: Database Mining: A Performance Perspective.
TKDE 5(6)
: 914-925(1993)
-
[3]
-
Rakesh Agrawal
,
Heikki Mannila
,
Ramakrishnan Srikant
,
Hannu Toivonen
,
A. Inkeri Verkamo
: Fast Discovery of Association Rules.
Advances in Knowledge Discovery and Data Mining. 1996
: 307-328
-
[4]
-
Rakesh Agrawal
,
Giuseppe Psaila
: Active Data Mining.
KDD 1995
: 3-8
-
[5]
-
Rakesh Agrawal
,
Ramakrishnan Srikant
: Fast Algorithms for Mining Association Rules in Large Databases.
VLDB 1994
: 487-499
-
[6]
-
Andreas Arning
,
Rakesh Agrawal
,
Prabhakar Raghavan
: A Linear Method for Deviation Detection in Large Databases.
KDD 1996
: 164-169
-
[7]
-
...
-
[8]
-
Leo Breiman
, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
-
[9]
-
Sergey Brin
,
Rajeev Motwani
,
Jeffrey D. Ullman
,
Shalom Tsur
: Dynamic Itemset Counting and Implication Rules for Market Basket Data.
SIGMOD Conference 1997
: 255-264
-
[10]
-
Soumen Chakrabarti
,
Sunita Sarawagi
,
Byron Dom
: Mining Surprising Patterns Using Temporal Description Length.
VLDB 1998
: 606-617
-
[11]
-
David Wai-Lok Cheung
,
Jiawei Han
,
Vincent Ng
,
C. Y. Wong
: Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique.
ICDE 1996
: 106-114
-
[12]
-
...
-
[13]
-
...
-
[14]
-
...
-
[15]
-
Martin Ester
,
Hans-Peter Kriegel
,
Xiaowei Xu
: A Database Interface for Clustering in Large Spatial Databases.
KDD 1995
: 94-99
-
[16]
-
...
-
[17]
-
Ronen Feldman
,
Yonatan Aumann
,
Amihood Amir
,
Heikki Mannila
: Efficient Algorithms for Discovering Frequent Sets in Incremental Databases.
DMKD 1997
: 0-
-
[18]
-
...
-
[19]
-
Venkatesh Ganti
,
Raghu Ramakrishnan
,
Johannes Gehrke
,
Allison L. Powell
,
James C. French
: Clustering Large Datasets in Arbitrary Metric Spaces.
ICDE 1999
: 502-511
-
[20]
-
Johannes Gehrke
,
Raghu Ramakrishnan
,
Venkatesh Ganti
: RainForest - A Framework for Fast Decision Tree Construction of Large Datasets.
VLDB 1998
: 416-427
-
[21]
-
David Gibson
,
Jon M. Kleinberg
,
Prabhakar Raghavan
: Clustering Categorical Data: An Approach Based on Dynamical Systems.
VLDB 1998
: 311-322
-
[22]
-
...
-
[23]
-
Sudipto Guha
,
Rajeev Rastogi
,
Kyuseok Shim
: CURE: An Efficient Clustering Algorithm for Large Databases.
SIGMOD Conference 1998
: 73-84
-
[24]
-
Sudipto Guha
,
Rajeev Rastogi
,
Kyuseok Shim
: ROCK: A Robust Clustering Algorithm for Categorical Attributes.
ICDE 1999
: 512-521
-
[25]
-
Isabelle Guyon
,
Nada Matic
,
Vladimir Vapnik
: Discovering Informative Patterns and Data Cleaning.
KDD Workshop 1994
: 145-156
-
[26]
-
Tomasz Imielinski
,
Heikki Mannila
: A Database Perspective on Knowledge Discovery.
CACM 39(11)
: 58-64(1996)
-
[27]
-
Edwin M. Knorr
,
Raymond T. Ng
: Algorithms for Mining Distance-Based Outliers in Large Datasets.
VLDB 1998
: 392-403
-
[28]
-
Manish Mehta
,
Rakesh Agrawal
,
Jorma Rissanen
: SLIQ: A Fast Scalable Classifier for Data Mining.
EDBT 1996
: 18-32
-
[29]
-
Raymond T. Ng
,
Jiawei Han
: Efficient and Effective Clustering Methods for Spatial Data Mining.
VLDB 1994
: 144-155
-
[30]
-
Raymond T. Ng
,
Laks V. S. Lakshmanan
,
Jiawei Han
,
Alex Pang
: Exploratory Mining and Pruning Optimizations of Constrained Association Rules.
SIGMOD Conference 1998
: 13-24
-
[31]
-
Jong Soo Park
,
Ming-Syan Chen
,
Philip S. Yu
: An Effective Hash Based Algorithm for Mining Association Rules.
SIGMOD Conference 1995
: 175-186
-
[32]
-
Rajeev Rastogi
,
Kyuseok Shim
: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning.
VLDB 1998
: 404-415
-
[33]
-
Ashoka Savasere
,
Edward Omiecinski
,
Shamkant B. Navathe
: An Efficient Algorithm for Mining Association Rules in Large Databases.
VLDB 1995
: 432-444
-
[34]
-
John C. Shafer
,
Rakesh Agrawal
,
Manish Mehta
: SPRINT: A Scalable Parallel Classifier for Data Mining.
VLDB 1996
: 544-555
-
[35]
-
...
-
[36]
-
Abraham Silberschatz
,
Alexander Tuzhilin
: What Makes Patterns Interesting in Knowledge Discovery Systems.
TKDE 8(6)
: 970-974(1996)
-
[37]
-
Shiby Thomas
,
Sreenath Bodagala
,
Khaled Alsabti
,
Sanjay Ranka
: An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases.
KDD 1997
: 263-266
-
[38]
-
Tian Zhang
,
Raghu Ramakrishnan
,
Miron Livny
: BIRCH: An Efficient Data Clustering Method for Very Large Databases.
SIGMOD Conf. 1996
: 103-114
@inproceedings{DBLP:conf/pods/GantiGR99,
author = {Venkatesh Ganti and
Johannes Gehrke and
Raghu Ramakrishnan},
title = {A Framework for Measuring Changes in Data Characteristics},
booktitle = {Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium
on Principles of Database Systems, May 31 - June 2, 1999, Philadelphia,
Pennsylvania},
publisher = {ACM Press},
year = {1999},
isbn = {1-58113-062-7},
pages = {126-137},
crossref = {DBLP:conf/pods/99},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
Copyright(C) 2000 ACM
|
|
|
|
|
|
|