ACM SIGMOD Anthology VLDB dblp.uni-trier.de

SPIRIT: Sequential Pattern Mining with Regular Expression Constraints.

Minos N. Garofalakis, Rajeev Rastogi, Kyuseok Shim: SPIRIT: Sequential Pattern Mining with Regular Expression Constraints. VLDB 1999: 223-234
@inproceedings{DBLP:conf/vldb/GarofalakisRS99,
  author    = {Minos N. Garofalakis and
               Rajeev Rastogi and
               Kyuseok Shim},
  editor    = {Malcolm P. Atkinson and
               Maria E. Orlowska and
               Patrick Valduriez and
               Stanley B. Zdonik and
               Michael L. Brodie},
  title     = {SPIRIT: Sequential Pattern Mining with Regular Expression Constraints},
  booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
               Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
               UK},
  publisher = {Morgan Kaufmann},
  year      = {1999},
  isbn      = {1-55860-615-7},
  pages     = {223-234},
  ee        = {db/conf/vldb/GarofalakisRS99.html},
  crossref  = {DBLP:conf/vldb/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. In this paper, we propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables user-controlled focus to be incorporated into the pattern mining process. We develop a family of novel algorithms (termed SPIRIT - Sequential Pattern mIning with Regular expressIon con-sTraints) for mining frequent sequential patterns that also satisfy user-specified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the tradeoffs that arise when constraints that do not subscribe to nice properties (like anti-monotonicity) are integrated into the mining process. A quantitative exploration of these tradeoffs is conducted through an extensive experimental study on synthetic and real-life data sets.

Copyright © 1999 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.


Online Paper

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, Michael L. Brodie (Eds.): VLDB'99, Proceedings of 25th International Conference on Very Large Data Bases, September 7-10, 1999, Edinburgh, Scotland, UK. Morgan Kaufmann 1999, ISBN 1-55860-615-7
Contents BibTeX

References

[1]
Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, Mohamed Zaït: Querying Shapes of Histories. VLDB 1995: 502-514 BibTeX
[2]
Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: 487-499 BibTeX
[3]
Rakesh Agrawal, Ramakrishnan Srikant: Mining Sequential Patterns. ICDE 1995: 3-14 BibTeX
[4]
Ming-Syan Chen, Jong Soo Park, Philip S. Yu: Efficient Data Mining for Path Traversal Patterns. IEEE Trans. Knowl. Data Eng. 10(2): 209-221(1998) BibTeX
[5]
...
[6]
Harry R. Lewis, Christos H. Papadimitriou: Elements of the Theory of Computation. Prentice-Hall 1981, ISBN 0-13-273417-6
BibTeX
[7]
Heikki Mannila, Hannu Toivonen: Discovering Generalized Episodes Using Minimal Occurrences. KDD 1996: 146-151 BibTeX
[8]
Heikki Mannila, Hannu Toivonen, A. Inkeri Verkamo: Discovering Frequent Episodes in Sequences. KDD 1995: 210-215 BibTeX
[9]
Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex Pang: Exploratory Mining and Pruning Optimizations of Constrained Association Rules. SIGMOD Conference 1998: 13-24 BibTeX
[10]
Ramakrishnan Srikant, Quoc Vu, Rakesh Agrawal: Mining Association Rules with Item Constraints. KDD 1997: 67-73 BibTeX
[11]
Ramakrishnan Srikant, Rakesh Agrawal: Mining Sequential Patterns: Generalizations and Performance Improvements. EDBT 1996: 3-17 BibTeX
[12]
Jason Tsong-Li Wang, Gung-Wei Chirn, Thomas G. Marr, Bruce A. Shapiro, Dennis Shasha, Kaizhong Zhang: Combinatorial Pattern Discovery for Scientific Data: Some Preliminary Results. SIGMOD Conference 1994: 115-125 BibTeX

Referenced by

  1. Theodore Johnson, Laks V. S. Lakshmanan, Raymond T. Ng: The 3W Model and Algebra for Unified Data Mining. VLDB 2000: 21-32
  2. Minos N. Garofalakis, Rajeev Rastogi, S. Seshadri, Kyuseok Shim: Data Mining and the Web: Past, Present and Future. Workshop on Web Information and Data Management 1999: 43-47
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:46:26 2009