Statistical Databases: Characteristics, Problems, and some Solutions.

Arie Shoshani: Statistical Databases: Characteristics, Problems, and some Solutions. VLDB 1982: 208-222
  author    = {Arie Shoshani},
  title     = {Statistical Databases: Characteristics, Problems, and some Solutions},
  booktitle = {Eigth International Conference on Very Large Data Bases, September
               8-10, 1982, Mexico City, Mexico, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1982},
  isbn      = {0-934613-14-1},
  pages     = {208-222},
  ee        = {db/conf/vldb/Shoshani82.html},
  crossref  = {DBLP:conf/vldb/82},
  bibsource = {DBLP,}


The purpose of this paper is to describe the nature of statistical data bases and the special problems associated with them. Since statistical data bases are common in a variety of application areas, the paper begins by describing several examples that emphasize the complexity, the size and the difficulties of dealing with such data bases. A description is then given of the characteristics of statistical data bases in terms of data structures and usage. The remainder of the paper describes a large collection of problems, and when appropriate some solutions or work in progress. The problems and solutions are organized into the following areas: physical organization, optimization, logical modelling, user interface, integrating statistical analysis and data management, and security.

Copyright © 1982 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 4, VLDB '75-'88" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Eigth International Conference on Very Large Data Bases, September 8-10, 1982, Mexico City, Mexico, Proceedings. Morgan Kaufmann 1982, ISBN 0-934613-14-1
Contents BibTeX


[Achugbue & Chin 78]
[Baker 76]
[Beck 80]
Leland L. Beck: A Security Mechanism for Statistical Databases. ACM Trans. Database Syst. 5(3): 316-338(1980) BibTeX
[Becker & Chambers 80]
[Boral et al 82]
Douglas M. Bates, Haran Boral, David J. DeWitt: A Framework for Research in Database Management for Statistical Analysis. SIGMOD Conference 1982: 69-78 BibTeX
[Bragg 81]
[Burnett & Thomas 81]
[Chan & Shoshani 81]
Paul Chan, Arie Shoshani: SUBJECT: A Directory Driven System for Organizing and Accessing Large Statistical Databases. VLDB 1981: 553-563 BibTeX
[Chin & Ozsoyoglu 81]
Francis Y. L. Chin, Gultekin Özsoyoglu: Auditing and Inference Control in Statistical Databases. IEEE Trans. Software Eng. 8(6): 574-582(1982) BibTeX
[Cohen & Hay 81]
[Denning et al 79]
Dorothy E. Denning, Peter J. Denning, Mayer D. Schwartz: The Tracker: A Threat to Statistical Database Security. ACM Trans. Database Syst. 4(1): 76-96(1979) BibTeX
[Denning 80]
Dorothy E. Denning: Secure Statistical Databases with Random Sample Queries. ACM Trans. Database Syst. 5(3): 291-315(1980) BibTeX
[Denning & Schlorer 80]
Dorothy E. Denning, Jan Schlörer: A Fast Procedure for Finding a Tracker in a Statistical Database. ACM Trans. Database Syst. 5(1): 88-102(1980) BibTeX
[DHEW 75]
[Dobkin et al 79]
David P. Dobkin, Anita K. Jones, Richard J. Lipton: Secure Databases: Protection Against User Influence. ACM Trans. Database Syst. 4(1): 97-106(1979) BibTeX
[Eggers & Shoshani 80]
Susan J. Eggers, Arie Shoshani: Efficient Access of Compressed Data. VLDB 1980: 205-211 BibTeX
[Eggers et al 81]
Susan J. Eggers, Frank Olken, Arie Shoshani: A Compression Technique for Large Statistical Data-Bases. VLDB 1981: 424-434 BibTeX
[Gey 81]
[Hammer & Niamir 79]
Michael Hammer, Bahram Niamir: A Heuristic Approach to Attribute Partitioning. SIGMOD Conference 1979: 93-101 BibTeX
[Haq 77]
[Hollabaugh & Reinwald 81]
[Hawthorne 82]
Paula B. Hawthorn: Microprocessor Assisted Tuple Access, Decompression and Assembly for Statistical Database Systems. VLDB 1982: 223-233 BibTeX
[Hideto & Kobayashi 81]
[Johansson & Schilling 81]
[Johnson 81]
Rowland R. Johnson: Modelling Summary Data. SIGMOD Conference 1981: 93-97 BibTeX
[Johnson 81a]
[Kam & Ullman 77]
John B. Kam, Jeffrey D. Ullman: A Model of Statistical Databases and Their Security. ACM Trans. Database Syst. 2(1): 1-10(1977) BibTeX
[Klug 81]
[Lehot 77]
[McCarthy 82]
John L. McCarthy: Metadata Management for Large Statistical Databases. VLDB 1982: 234-243 BibTeX
[McCarthy et al 82]
[Merrill et al 79]
[Merrill 82]
[Meyers 69]
[Nie et al 75]
[SAS 79]
[Schlorer 75]
[Svensson 79]
Per Svensson: On Search Performance for Conjunctive Queries in Compressed, Fully Transposed Ordered Files. VLDB 1979: 155-163 BibTeX
[Teitel 77]
[Turner et al 79]
M. J. Turner, R. Hammond, P. Cotton: A DBMS for Large Statistical Databases. VLDB 1979: 319-327 BibTeX
[Weeks et al 81]
[Wong & Kuo 82]
Harry K. T. Wong, Ivy Kuo: GUIDE: Graphical User Interface for Database Exploration. VLDB 1982: 22-32 BibTeX
[Yu and Chin 77]
Clement T. Yu, Francis Y. L. Chin: A Study on the Protection of Statistical Data Bases. SIGMOD Conference 1977: 169-181 BibTeX

Referenced by

  1. Rakesh Agrawal, Ramakrishnan Srikant: Privacy-Preserving Data Mining. SIGMOD Conference 2000: 439-450
  2. Jianzhong Li, Doron Rotem, Jaideep Srivastava: Aggregation Algorithms for Very Large Compressed Data Warehouses. VLDB 1999: 651-662
  3. Michael Böhnlein, Achim Ulbrich-vom Ende: Deriving Initial Data Warehouse Structures from the Conceptual Data Models of the Underlying Operational Information Systems. DOLAP 1999: 15-21
  4. Wolfgang Lehner, Jens Albrecht, Hartmut Wedekind: Normal Forms for Multidimensional Databases. SSDBM 1998: 63-72
  5. Wolfgang Lehner: Modelling Large Scale OLAP Scenarios. EDBT 1998: 153-167
  6. Robert C. Goldstein, Christian Wagner: Database Management with Sequence Trees and Tokens. IEEE Trans. Knowl. Data Eng. 9(1): 186-192(1997)
  7. Marc Gyssens, Laks V. S. Lakshmanan: A Foundation for Multi-dimensional Databases. VLDB 1997: 106-115
  8. Hans-Joachim Lenz, Arie Shoshani: Summarizability in OLAP and Statistical Data Bases. SSDBM 1997: 132-143
  9. Arie Shoshani: OLAP and Statistical Databases: Similarities and Differences. PODS 1997: 185-196
  10. Rakesh Agrawal, Ashish Gupta, Sunita Sarawagi: Modeling Multidimensional Databases. ICDE 1997: 232-243
  11. Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi: On the Computation of Multidimensional Aggregates. VLDB 1996: 506-521
  12. Pat Dean, Bo Sundgren: Quality Aspects of a Modern Database Service (Position Paper). SSDBM 1996: 156-161
  13. Paul Cotofrei, Henri Luchian: Statistical Dependencies. SSDBM 1996: 32-41
  14. Malee Wongsaroje: Extensible Data Modeling for Statistical Databases. DASFAA 1995: 318-325
  15. Wee Keong Ng, Chinya V. Ravishankar: Information Synthesis in Statistical Databases. CIKM 1995: 355-361
  16. Wee Keong Ng, Chinya V. Ravishankar: A Physical Storage for Efficient Statistical Query Processing. SSDBM 1994: 97-106
  17. Erik Malmborg, Bo Sundgren: Integration of Statistical Information Systems - Theory and Practice. SSDBM 1994: 80-89
  18. Rosine Cicchetti, Lotfi Lakhal: Matrix-Relation for Statistical Database Management. EDBT 1994: 31-44
  19. Francesco M. Malvestuto: A Universal-Scheme Approach to Statistical Databases Containing Homogeneous Summary Tables. ACM Trans. Database Syst. 18(4): 678-708(1993)
  20. Maurizio Rafanelli, Fabrizio L. Ricci: Mefisto: A Functional Model for Statistical Entities. IEEE Trans. Knowl. Data Eng. 5(4): 670-681(1993)
  21. Richard H. Wolniewicz, Goetz Graefe: Algebraic Optimization of Computations over Scientific Databases. VLDB 1993: 13-24
  22. Christian S. Jensen, Leo Mark: Queries on Change in an Extended Relational Model. IEEE Trans. Knowl. Data Eng. 4(2): 192-200(1992)
  23. Soraya Abad-Mota: Approximate Query Processing with Summary Tables in Statistical Databases. EDBT 1992: 499-515
  24. Sakti P. Ghosh: Statistical Relational Databases: Normal Forms. IEEE Trans. Knowl. Data Eng. 3(1): 55-64(1991)
  25. Francesco M. Malvestuto, Marina Moscarini: Query Evaluability in Statistical Databases. IEEE Trans. Knowl. Data Eng. 2(4): 425-430(1990)
  26. Maurizio Rafanelli, Arie Shoshani: STORM: A Statistical Object Representation Model. SSDBM 1990: 14-29
  27. John L. Pfaltz, James C. French: Implementing Subscripted Identifiers in Scientific Databases. SSDBM 1990: 80-91
  28. Won Kim: Object-Oriented Approach to Managing Statistical and Scientific Databases. SSDBM 1990: 1-13
  29. Tiziana Catarci, Giuseppe Santucci: GRASP: A Graphical System for Statistical Databases. SSDBM 1990: 148-162
  30. Gultekin Özsoyoglu, Victor Matos, Z. Meral Özsoyoglu: Query Processing Techniques in the Summary-Table-by-Example Database Query Language. ACM Trans. Database Syst. 14(4): 526-573(1989)
  31. Jaideep Srivastava, Jack S. Eddy Tan, Vincent Y. Lum: TBSAM: An Access Method for Efficient Processing of Statistical Queries. IEEE Trans. Knowl. Data Eng. 1(4): 414-423(1989)
  32. Francesco M. Malvestuto, Marina Moscarini: Aggregate Evaluability in Statistical Databases. VLDB 1989: 279-286
  33. Lotfi Lakhal, Rosine Cicchetti, Serge Miranda: RTL - A Relation and Table Language for Statistical Databases. MFDBS 1989: 285-300
  34. Rosine Cicchetti, Lotfi Lakhal, Nanh Le Thanh, Serge Miranda: A Logical Summary-Data Model for Macro Statistical Databases. DASFAA 1989: 43-51
  35. Jaideep Srivastava, Doron Rotem: Precision-Time Tradeoffs: A Paradigm for Processing Statistical Queries on Databases. SSDBM 1988: 226-245
  36. Hideto Sato: A Data Model, Knowledge Base, and Natural Language Processing for Sharing a Large Statistical Database. SSDBM 1988: 207-225
  37. Maurizio Rafanelli: Research Topics in Statistical and Scientific Database Management: the IV SSDBM. SSDBM 1988: 1-18
  38. Francesco M. Malvestuto, C. Zuffada: The Classification Problem with Semantically Heterogeneous Data. SSDBM 1988: 157-176
  39. Erik Malmborg: Design of the User-Interface for an Object-Oriented Statistical Data-Base. SSDBM 1988: 314-326
  40. Sakti P. Ghosh: Statistical Relational Model. SSDBM 1988: 338-355
  41. Giorgio Gambosi, Enrico Nardelli, Maurizio Talamo: A Conceptual Model for the Representation of Statistical Data in Geographical Information Systems. SSDBM 1988: 278-290
  42. Alessandro D'Atri, Fabrizio L. Ricci: Interpretation of Statistical Queries to Relational Databases. SSDBM 1988: 246-258
  43. Meng Chang Chen, Lawrence McNamee, Michel A. Melkanoff: A Model of Summary Data and its Applications in Statistical Databases. SSDBM 1988: 356-372
  44. G. Barcaroli, Giuseppe Di Battista, E. Fortunato, C. Leporelli: Design of Statistical Information Media: Time Performance and Storage Constraints. SSDBM 1988: 93-104
  45. Jaideep Srivastava, Vincent Y. Lum: A Tree Based Access Method (TBSAM) for Fast Processing of Aggregate Queries. ICDE 1988: 504-510
  46. Michael A. Palley, Jeffrey S. Simonoff: The Use of Regression Methodology for the Compromise of Confidential Information in Statistical Databases. ACM Trans. Database Syst. 12(4): 593-608(1987)
  47. Richard Hull, Roger King: Semantic Database Modeling: Survey, Applications, and Research Issues. ACM Comput. Surv. 19(3): 201-260(1987)
  48. Jianzhong Li, Doron Rotem, Harry K. T. Wong: A New Compression Method with Fast Searching on Large Databases. VLDB 1987: 311-318
  49. W. Bradley Rubenstein: A Database Design for Musical Information. SIGMOD Conference 1987: 479-490
  50. Harry K. T. Wong, J. Z. Li: Transposition Algorithms on Very Large Compressed Databases. VLDB 1986: 304-311
  51. Michael A. Palley: Security of Statistical Databases - Compromise through Attribute Correlational Modeling. ICDE 1986: 67-74
  52. Gultekin Özsoyoglu, Z. Meral Özsoyoglu, Francisco Mata: A Language and a Physical Organization Technique for Summary Tables. SIGMOD Conference 1985: 3-16
  53. Matthias Jarke, Jürgen Koch: Query Optimization in Database Systems. ACM Comput. Surv. 16(2): 111-152(1984)
  54. Arie Shoshani, Frank Olken, Harry K. T. Wong: Characteristics of Scientific Databases. VLDB 1984: 147-160
  55. Chaitanya K. Baru, Stanley Y. W. Su: Performance Evaluation of the Statistical Aggregation by Caterogization in the SM3 System. SIGMOD Conference 1984: 77-89
  56. Harry K. T. Wong: Micro and Macro Statistical/Scientific Database Management. ICDE 1984: 104-106
  57. Z. Meral Özsoyoglu, Gultekin Özsoyoglu: Summary-Table-By-Example: A Database Query Language for Manipulating Summary Data. ICDE 1984: 193-202
  58. Sakti P. Ghosh: An Application of Statistical Databases in Manufacturing Testing. ICDE 1984: 96-103
  59. Stanley Y. W. Su, Shamkant B. Navathe, Don S. Batory: Logical and Physical Modeling of Statistical Scientific Databases. SSDBM 1983: 251-263
  60. Maurizio Rafanelli, Fabrizio L. Ricci: Proposal of a Logical Model for Statistical Data Base. SSDBM 1983: 264-272
  61. Gultekin Özsoyoglu, Z. Meral Özsoyoglu: Features of a System for Statistical Databases. SSDBM 1983: 9-18
  62. Hamid Farsi, John Tartar: A Relational Database Machine for Efficient Processing of Statistical Queries. SSDBM 1983: 64-72
  63. Dorothy E. Denning, Wesley Nicholson, Gordon Sande, Arie Shoshani: Research Topics in Statistical Database Management. SSDBM 1983: 46-51
  64. Dorothy E. Denning: A Security Model for the Statistical Database Problem. SSDBM 1983: 368-390
  65. Paul Chan, Susan J. Eggers, Fredric C. Gey, Harvard Holmes, Peter Kreps, John McCarthy, Deane Merrill, Frank Olken, Arie Shoshani, Harry K. T. Wong: Statistical Data Management Research at Lawrence Berkeley Laboratory. SSDBM 1983: 273-279
  66. Don S. Batory: Index Coding: A Compression Technique for Large Statistical Databases. SSDBM 1983: 306-314
  67. Neil C. Rowe: Top-Down Statistical Estimation on a Database. SIGMOD Conference 1983: 135-145
  68. Guy M. Lohman, Joseph C. Stoltzfus, Anita N. Benson, Michael D. Martin, Alfonso F. Cardenas: Remotely-Sensed Geophysical Databases: Experience and Implications for Generalized DBMS. SIGMOD Conference 1983: 146-160
  69. John L. McCarthy: Metadata Management for Large Statistical Databases. VLDB 1982: 234-243
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:45:16 2009