Mind Your Grammar: a New Approach to Modelling Text.

Gaston H. Gonnet, Frank Wm. Tompa: Mind Your Grammar: a New Approach to Modelling Text. VLDB 1987: 339-346
  author    = {Gaston H. Gonnet and
               Frank Wm. Tompa},
  editor    = {Peter M. Stocker and
               William Kent and
               Peter Hammersley},
  title     = {Mind Your Grammar: a New Approach to Modelling Text},
  booktitle = {VLDB'87, Proceedings of 13th International Conference on Very
               Large Data Bases, September 1-4, 1987, Brighton, England},
  publisher = {Morgan Kaufmann},
  year      = {1987},
  isbn      = {0-934613-46-X},
  pages     = {339-346},
  ee        = {db/conf/vldb/GonnetT87.html},
  crossref  = {DBLP:conf/vldb/87},
  bibsource = {DBLP,}


Beginning to create the New Oxford English Dictionary database has resulted in the realization that databases for reference texts are unlike those for conventional enterprises. While the traditional approaches to database design and development are sound, the particular techniques used for commercial databases have been repeatedly found to be inappropriate for text-dominated databases, such as the New OED.

In the same way that the relational model was developed based on experiences gained from earlier database approaches, the grammar-based model presented here builds on the traditional foundations of computer science, and particularly database theory and practice. This new model uses grammars as schemas and "parsed strings" as instances. Operators on the parsed strings are defined, resulting in a "p-string algebra" that can be used for data manipulation and view definition.

The model is representation-independent and the operators are non-navigational, so that efficient implementations may be developed for unknown future hardware and operating systems. Several approaches to storage structures and efficient processing algorithms for representative hardware configurations have been investigated.

Copyright © 1987 by the VLDB Endowment. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by the permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment.

