The Power of Sampling in Knowledge Discovery.

Jyrki Kivinen, Heikki Mannila: The Power of Sampling in Knowledge Discovery. PODS 1994: 77-85
  author    = {Jyrki Kivinen and
               Heikki Mannila},
  title     = {The Power of Sampling in Knowledge Discovery},
  booktitle = {Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, May 24-26, 1994, Minneapolis,
  publisher = {ACM Press},
  year      = {1994},
  isbn      = {0-89791-642-5},
  pages     = {77-85},
  ee        = {, db/conf/pods/pods94-77.html},
  crossref  = {DBLP:conf/pods/94},
  bibsource = {DBLP,}


We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M. We define two different measures for the error of a universal sentence in a relation. For a set of n universal sentences each with at most k universal quantifiers, we give upper and lower bounds for the sample sizes required for having a high probability that all the sentences with error at least epsilon can be detected as false by considering the sample. The sample sizes are O((log n)/epsilon) or O((|M|(1-1/k) log n)/epsilon), depending on the error measure used. We also consider universal-existential sentences.

Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ... BibTeX

Printed Edition

Proceedings of the Thirteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 24-26, 1994, Minneapolis, Minnesota. ACM Press 1994, ISBN 0-89791-642-5
Contents BibTeX

Online Edition: ACM Digital Library

[Abstract and Index Terms]
[Full Text in PDF Format, 939 KB]


William W. Cohen: Pac-Learning a Restricted Class of Recursive Logic Programs. AAAI 1993: 86-92 BibTeX
Ronald Fagin: Horn clauses and database dependencies. J. ACM 29(4): 952-985(1982) BibTeX
Adam J. Grove, Joseph Y. Halpern, Daphne Koller: Asymptotic Conditional Probabilities for First-Order Logic. STOC 1992: 294-305 BibTeX
Wen-Chi Hou, Gultekin Özsoyoglu: Statistical Estimators for Aggregate Relational Algebra Queries. ACM Trans. Database Syst. 16(4): 600-654(1991) BibTeX
Peter J. Haas, Arun N. Swami: Sequential Sampling Procedures for Query Size Estimation. SIGMOD Conference 1992: 341-350 BibTeX
Ravi Krishnamurthy, Tomasz Imielinski: Research Directions in Knowledge Discovery. SIGMOD Record 20(3): 76-78(1991) BibTeX
Jyrki Kivinen, Heikki Mannila: Approximate Dependency Inference from Relations. ICDT 1992: 86-98 BibTeX
Richard J. Lipton, Jeffrey F. Naughton, Donovan A. Schneider: Practical Selectivity Estimation through Adaptive Sampling. SIGMOD Conference 1990: 1-11 BibTeX
Jeffrey F. Naughton, S. Seshadri: On Estimating the Size of Projections. ICDT 1990: 499-513 BibTeX
Gregory Piatetsky-Shapiro: Discovery, Analysis, and Presentation of Strong Rules. Knowledge Discovery in Databases 1991: 229-248 BibTeX
Gregory Piatetsky-Shapiro, William J. Frawley (Eds.): Knowledge Discovery in Databases. AAAI/MIT Press 1991, ISBN 0-262-62080-4
Contents BibTeX
Shaibal Roy: Semantic Complexity of Classes of Relational Queries and Query Independent Data Partitioning. PODS 1991: 259-267 BibTeX
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents BibTeX
Leslie G. Valiant: A Theory of the Learnable. Commun. ACM 27(11): 1134-1142(1984) BibTeX

Referenced by

  1. Daniel Barbará, William DuMouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis E. Ioannidis, H. V. Jagadish, Theodore Johnson, Raymond T. Ng, Viswanath Poosala, Kenneth A. Ross, Kenneth C. Sevcik: The New Jersey Data Reduction Report. IEEE Data Eng. Bull. 20(4): 3-45(1997)
  2. Heikki Mannila: Methods and Problems in Data Mining. ICDT 1997: 41-55
  3. Hannu Toivonen: Sampling Large Databases for Association Rules. VLDB 1996: 134-145
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:34:10 2009