A Universal-Scheme Approach to Statistical Databases Containing Homogeneous Summary Tables.

Francesco M. Malvestuto: A Universal-Scheme Approach to Statistical Databases Containing Homogeneous Summary Tables. ACM Trans. Database Syst. 18(4): 678-708(1993)
  author    = {Francesco M. Malvestuto},
  title     = {A Universal-Scheme Approach to Statistical Databases Containing
               Homogeneous Summary Tables},
  journal   = {ACM Trans. Database Syst.},
  volume    = {18},
  number    = {4},
  year      = {1993},
  pages     = {678-708},
  ee        = {, db/journals/tods/Malvestuto93.html},
  bibsource = {DBLP,}


In many situations a statistical database contains multiple summary tables, which report summary statistics on the same summary variable for the same population of individuals or objects using different classification criteria ("homogeneous" summary tables).

Existing query languages consider only those queries which may aggregate data stored in a single summary table. When a statistical database contains homogeneous summary tables, such query languages do not allow an integrated view of data, whereas statisticans are inclined to view and query a collection of homogeneous summary tables as if they were actually a single higher-dimensional summary table. This legitimizes the search for a universal-scheme solution to the problem of data integration in such statistical databases. It is shown that such a solution can be found if the database tables contain additive summary data. Accordingly, queries are grouped into three classes: queries that can be evaluated to single values (evatuabte queries); queries that can be evaluated to value ranges (answerable queries); and queries whose values remain unknown (unanswerable queries). The membership of a given query to one of these three classes is not an intrinsic property of the query, but depends on both the type of the summary variable and tbe dependencies that are assumed in the universal scheme by the database designer. On the basis of such information, linear-time procedures for recognizing and answering answerable and evaluable queries are developed.

Copyright © 1993 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.

Joint ACM SIGMOD / IEEE Computer Society Anthology

CDROM Version: Load the CDROM "Volume 3 Issue 2, TODS 1991-1995, TKDE 1989-1992" and ... DVD Version: Load ACM SIGMOD Anthology DVD 2" and ... BibTeX

Online Edition: ACM Digital Library

[Index Terms and Review]
[Full Text in PDF Format, 1951 KB]


Francis Y. L. Chin, Gultekin Özsoyoglu: Statistical Database Design. ACM Trans. Database Syst. 6(1): 113-139(1981) BibTeX
Francis Y. L. Chin, Gultekin Özsoyoglu: Auditing and Inference Control in Statistical Databases. IEEE Trans. Software Eng. 8(6): 574-582(1982) BibTeX
E. F. Codd: A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13(6): 377-387(1970) BibTeX
Stavros S. Cosmadakis, Paris C. Kanellakis, Nicolas Spyratos: Partition Semantics for Relations. J. Comput. Syst. Sci. 33(2): 203-233(1986) BibTeX
Dorothy E. Denning, Jan Schlörer: Inference Controls for Statistical Databases. IEEE Computer 16(7): 69-82(1983) BibTeX
Sakti P. Ghosh: Statistical Relational Tables for Statistical Database Management. IEEE Trans. Software Eng. 12(12): 1106-1116(1986) BibTeX
Andrew V. Goldberg, Robert Endre Tarjan: A new approach to the maximum-flow problem. J. ACM 35(4): 921-940(1988) BibTeX
Dan Gusfield: A Graph Theoretic Approach to Statistical Data Security. SIAM J. Comput. 17(3): 552-571(1988) BibTeX
Anthony C. Klug: Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions. J. ACM 29(3): 699-717(1982) BibTeX
David Maier, David Rozenshtein, David Scott Warren: Window Functions. Advances in Computing Research 3: 213-246(1986) BibTeX
Francesco M. Malvestuto: Modelling Large Bases of Categorial Data With Acyclic Schemes. ICDT 1986: 323-340 BibTeX
Francesco M. Malvestuto: Answering Queries in Categorial Data Bases. PODS 1987: 87-96 BibTeX
Francesco M. Malvestuto: Answering Queries in Categorial Data Bases. PODS 1987: 87-96 BibTeX
Francesco M. Malvestuto, Marina Moscarini: Aggregate Evaluability in Statistical Databases. VLDB 1989: 279-286 BibTeX
Francesco M. Malvestuto, Marina Moscarini: Query Evaluability in Statistical Databases. IEEE Trans. Knowl. Data Eng. 2(4): 425-430(1990) BibTeX
Francesco M. Malvestuto, Marina Moscarini, Maurizio Rafanelli: Suppressing Marginal Cells to Protect Sensitive Information in a Two-Dimensional Statistical Table. PODS 1991: 252-258 BibTeX
Francesco M. Malvestuto, C. Zuffada: The Classification Problem with Semantically Heterogeneous Data. SSDBM 1988: 157-176 BibTeX
James B. Orlin: A Faster Strongly Polynominal Minimum Cost Flow Algorithm. STOC 1988: 377-387 BibTeX
Gultekin Özsoyoglu, Z. Meral Özsoyoglu: Statistical Database Query Languages. IEEE Trans. Software Eng. 11(10): 1071-1081(1985) BibTeX
Neil C. Rowe: Absolute Bounds on Set Intersection and Union Sizes from Distribution Information. IEEE Trans. Software Eng. 14(7): 1033-1048(1988) BibTeX
H. Sato: Handling Summary Information in a Database: Derivability. SIGMOD Conference 1981: 98-107 BibTeX
Mayer D. Schwartz, Dorothy E. Denning, Peter J. Denning: Linear Queries in Statistical Databases. ACM Trans. Database Syst. 4(2): 156-167(1979) BibTeX
Arie Shoshani: Statistical Databases: Characteristics, Problems, and some Solutions. VLDB 1982: 208-222 BibTeX
Arie Shoshani, Harry K. T. Wong: Statistical and Scientific Database Issues. IEEE Trans. Software Eng. 11(10): 1040-1047(1985) BibTeX
Jeffrey D. Ullman: Principles of Database Systems, 2nd Edition. Computer Science Press 1982, ISBN 0-914894-36-6

Referenced by

  1. Carlos A. Hurtado, Alberto O. Mendelzon: Reasoning about Summarizability in Heterogeneous Multidimensional Schemas. ICDT 2001: 375-389
  2. Stéphane Grumbach, Leonardo Tininini: On the Content of Materialized Aggregate Views. PODS 2000: 47-57
  3. Francesco M. Malvestuto, Marina Moscarini: Computational Issues Connected with the Protection of Sensitive Statistics by Auditing Sum Queries. SSDBM 1998: 134-144
  4. Pai-Cheng Chu: Cell Suppression Methodology: The Importance of Suppressing Marginal Totals. IEEE Trans. Knowl. Data Eng. 9(4): 513-523(1997)
  5. Christos Faloutsos, H. V. Jagadish, Nikolaos Sidiropoulos: Recovering Information from Summary Data. VLDB 1997: 36-45
  6. Tsan-sheng Hsu, Ming-Yang Kao: Security Problems for Statistical Databases with General Cell Suppressions. SSDBM 1997: 155-164
  7. Doron Rotem, J. Leon Zhao: Extendible Arrays for Statistical Databases and OLAP Applications. SSDBM 1996: 108-117
  8. Francesco M. Malvestuto, Marina Moscarini: Censoring Statistical Tables to Protect Sensitive Information: Easy and Hard Problems. SSDBM 1996: 12-21
  9. Antonia Bezenchek, Maurizio Rafanelli, Leonardo Tininini: A Data Structure for Representing Aggregate Data. SSDBM 1996: 22-31
  10. Malee Wongsaroje: Extensible Data Modeling for Statistical Databases. DASFAA 1995: 318-325
  11. Wee Keong Ng, Chinya V. Ravishankar: Information Synthesis in Statistical Databases. CIKM 1995: 355-361
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
TODS, ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Tue Jun 24 18:39:15 2008