The Rufus System: Information Organization for Semi-Structured Data.

Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, Joachim Thomas II: The Rufus System: Information Organization for Semi-Structured Data. VLDB 1993: 97-107
  author    = {Kurt A. Shoens and
               Allen Luniewski and
               Peter M. Schwarz and
               James W. Stamos and
               Joachim Thomas II},
  editor    = {Rakesh Agrawal and
               Se{\'a}n Baker and
               David A. Bell},
  title     = {The Rufus System: Information Organization for Semi-Structured
  booktitle = {19th International Conference on Very Large Data Bases, August
               24-27, 1993, Dublin, Ireland, Proceedings},
  publisher = {Morgan Kaufmann},
  year      = {1993},
  isbn      = {1-55860-152-X},
  pages     = {97-107},
  ee        = {db/conf/vldb/SoensLSST93.html},
  crossref  = {DBLP:conf/vldb/93},
  bibsource = {DBLP,}


While database systems provide good function for writing applications on structured data, computer system users are inundated with a flood of semi-structured information, such as documents, electronic mail, programs, and images. Today, this information is typically stored in filesystems that provide limited support for organizing, searching, and operating upon this data. Current database systems are inappropriate for semi-structured information because they require that the data be translated to their data model, breaking all current applications that use the data. Although research in database systems has concentrated on extending them to handle more varieties of fully structured data, database systems provide important function that could help users of semi-structured information.

The Rufus system attacks the problems of semi-structured data. It provides searching, organizing, and browsing for the semi-structured information commonly stored in computer systems. Rufus models information with an extensible object-oriented class hierarchy and provides automatic classification of user data within that hierarchy. Query access is provided to help users search for needed information. Various ways of structuring user information are provided to help users browse. Methods associated with Rufus classes encapsulate actions that users can take on the data. These capabilities are packaged in a framework for use by applications. We have built two demonstration applications using this framework: a generic search and browse application called xrufus and an extension to the Usenet news reading program trn. These applications are in daily use at our research laboratory.

This paper describes the design and implementation of our framework, our experiences using it, and their influence on the next version of Rufus.

Copyright © 1993 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

Online Paper

ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 1 Issue 5, VLDB '89-'97" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

Rakesh Agrawal, Seán Baker, David A. Bell (Eds.): 19th International Conference on Very Large Data Bases, August 24-27, 1993, Dublin, Ireland, Proceedings. Morgan Kaufmann 1993, ISBN 1-55860-152-X
Contents BibTeX


Alfred V. Aho, Margaret J. Corasick: Efficient String Matching: An Aid to Bibliographic Search. Commun. ACM 18(6): 333-340(1975) BibTeX
Andrew P. Black, Norman C. Hutchinson, Eric Jul, Henry M. Levy: Object Structure in the Emerald System. OOPSLA 1986: 78-86 BibTeX
David C. Blair, M. E. Maron: An Evaluation of Retrieval Effectiveness for a Full-Text Document-Retrieval System. Commun. ACM 28(3): 289-299(1985) BibTeX
Jeff Conklin: Hypertext: An Introduction and Survey. IEEE Computer 20(9): 17-41(1987) BibTeX
O. Deux: The O2 System. Commun. ACM 34(10): 34-48(1991) BibTeX
Christos Faloutsos, H. V. Jagadish: On B-Tree Indices for Skewed Distributions. VLDB 1992: 363-374 BibTeX
David K. Gifford, Pierre Jouvelot, Mark A. Sheldon, James O'Toole: Semantic File Systems. SOSP 1991: 16-25 BibTeX
Adele Goldberg, David Robson: Smalltalk-80: The Language and Its Implementation. Addison-Wesley 1983
David Goldberg, David A. Nichols, Brian M. Oki, Douglas B. Terry: Using Collaborative Filtering to Weave an Information Tapestry. Commun. ACM 35(12): 61-70(1992) BibTeX
Eliezer Levy, Abraham Silberschatz: Distributed File Systems: Concepts and Examples. ACM Comput. Surv. 22(4): 321-374(1990) BibTeX
Thomas W. Malone, Kenneth R. Grant, Franklyn A. Turbak, Stephen A. Brobst, Michael D. Cohen: Intelligent Information-Sharing Systems. Commun. ACM 30(5): 390-402(1987) BibTeX
Brian P. McCune, Richard M. Tong, Jeffrey S. Dean, Daniel G. Shapiro: RUBRIC: A System for Rule-Based Information Retrieval. IEEE Trans. Software Eng. 11(9): 939-945(1985) BibTeX
Wayne Niblack, Ron Barber, William Equitz, Myron Flickner, Eduardo H. Glasman, Dragutin Petkovic, Peter Yanker, Christos Faloutsos, Gabriel Taubin: The QBIC Project: Querying Images by Content, Using Color, Texture, and Shape. Storage and Retrieval for Image and Video Databases (SPIE) 1993: 173-187 BibTeX
Joel E. Richardson, Peter M. Schwarz: Aspects: Extending Objects to Support Multiple, Independent Roles. SIGMOD Conference 1991: 298-307 BibTeX
Joel E. Richardson, Peter M. Schwarz: MDM: An Object-Oriented Data Model. DBPL 1991: 86-95 BibTeX
Gerard Salton: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley 1989, ISBN 0-201-12227-8
Nicole Yankelovich, Bernard J. Haan, Norman K. Meyrowitz, Steven M. Drucker: Intermedia: The Concept and the Construction of a Seamless Information Environment. IEEE Computer 21(1): 81-96(1988) BibTeX

Referenced by

  1. Serge Abiteboul, Sophie Cluet, Tova Milo: A Logical View of Structured Files. VLDB J. 7(2): 96-114(1998)
  2. Joachim Hammer, Jason McHugh, Hector Garcia-Molina: Semistructured Data: The Tsimmis Experience. ADBIS 1997: 1-8
  3. Stephen Blott, Lukas Relly, Hans-Jörg Schek: An Open Storage System for Abstract Objects. SIGMOD Conference 1996: 330-340
  4. Yannis Papakonstantinou, Hector Garcia-Molina, Jeffrey D. Ullman: MedMaker: A Mediation System Based on Declarative Specifications. ICDE 1996: 132-141
  5. Daniel Barbará, Sharad Mehrotra, Padmavathi Vallabhaneni: The Gold Text Indexing Engine. ICDE 1996: 172-179
  6. Markus Tresch, Neal Palmer, Allen Luniewski: Type Classification of Semi-Structured Documents. VLDB 1995: 263-274
  7. Serge Abiteboul, Sophie Cluet, Tova Milo: A Database Interface for File Updates. SIGMOD Conference 1995: 386-397
  8. Mariano P. Consens, Tova Milo: Algebras for Querying Text Regions. PODS 1995: 11-22
  9. Shinichi Ueshima, Kazuhiro Ohtsuki, Jun-ya Morishita, Qing Qian, Hiroaki Oiso, Katsumi Tanaka: Incremental Data Organization for Ancient Document Databases. DASFAA 1995: 457-466
  10. Markus Tresch, Allen Luniewski: An Extensible Classifier for Semi-Structured Documents. CIKM 1995: 226-233
  11. Narain H. Gehani, H. V. Jagadish, William D. Roome: OdeFS: A File System Interface to an Object-Oriented Database. VLDB 1994: 249-260
  12. Anthony Tomasic, Hector Garcia-Molina, Kurt A. Shoens: Incremental Updates of Inverted Lists for Text Document Retrieval. SIGMOD Conference 1994: 289-300
  13. Mariano P. Consens, Tova Milo: Optimizing Queries on Files. SIGMOD Conference 1994: 301-312
  14. Peter M. Schwarz, Kurt A. Shoens: Managing Change in the Rufus System. ICDE 1994: 170-179
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
VLDB Proceedings: Copyright © by VLDB Endowment,
ACM SIGMOD Anthology: Copyright © by ACM (, Corrections:
DBLP: Copyright © by Michael Ley (, last change: Sat May 16 23:45:55 2009