ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

Algebras for Querying Text Regions.

Mariano P. Consens, Tova Milo: Algebras for Querying Text Regions. PODS 1995: 11-22
@inproceedings{DBLP:conf/pods/ConsensM95,
  author    = {Mariano P. Consens and
               Tova Milo},
  title     = {Algebras for Querying Text Regions},
  booktitle = {Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium
               on Principles of Database Systems, May 22-25, 1995, San Jose,
               California},
  publisher = {ACM Press},
  year      = {1995},
  isbn      = {0-89791-730-8},
  pages     = {11-22},
  ee        = {http://doi.acm.org/10.1145/212433.212437, db/conf/pods/ConsensM95.html},
  crossref  = {DBLP:conf/pods/95},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

There is a significant amount of interest in combining and extending database and information retrieval technologies to manage textual data. The challenge is becoming more relevant due to the increased availability of documents in digital form. Document data has a natural hierarchical structure, which may be made explicit due to the use of markup conventions (as it is the case with SGML). An important aspect of managing structured and semi-structured textual data consists of supporting the efficient retrieval of text components based both on their content and structure.

In this paper we study issues related to the expressive power and optimization of a class of algebras that support combining string (or pattern) searches with queries on the hierarchical structure of the text. The region algebra studied is a set-at-a-time algebra for manipulating text regions (substrings of the text) that supports finding out nesting and ordering properties of the text regions. The region algebra is part of the language in use in commercial text retrieval systems, and can be implemented very efficiently.

The results in this work are obtained by showing a close relationship between the region algebra and the monadic first order theory of binary trees. We show that queries in the algebra can be optimized, but the optimization can be difficult (Co-NP-Hard in the general case, although there is an important class of queries that can be optimized in polynomial time). On the negative side, we show that the language is incapable of capturing some important properties of the text structure, related to the nesting and ordering of text regions. We conclude by suggesting possible extensions to increase the expressive power of the language, focusing again on optimization and expressibility.

Copyright © 1995 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98. and ... Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings. and ... BibTeX

Printed Edition

Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 22-25, 1995, San Jose, California. ACM Press 1995, ISBN 0-89791-730-8
Contents BibTeX

Online Edition: ACM Digital Library

[Index Terms]
[Full Text in PDF Format, 1137 KB]

References

[ACM93]
Serge Abiteboul, Sophie Cluet, Tova Milo: Querying and Updating the File. VLDB 1993: 73-84 BibTeX
[AFS93]
Rakesh Agrawal, Christos Faloutsos, Arun N. Swami: Efficient Similarity Search In Sequence Databases. FODO 1993: 69-84 BibTeX
[BGMM93]
Daniel Barbará, Chris Clifton, Fred Douglis, Hector Garcia-Molina, Stephen Johnson, Ben Kao, Sharad Mehrotra, Jens Tellefsen, Rosemary Walsh: The Gold Mailer. ICDE 1993: 92-99 BibTeX
[Bur92]
Forbes J. Burkowski: An Algebra for Hierarchically Organized Text-Dominate Databases. Inf. Process. Manage. 28(3): 333-348(1992) BibTeX
[CACS94]
Vassilis Christophides, Serge Abiteboul, Sophie Cluet, Michel Scholl: From Structured Documents to Novel Query Facilities. SIGMOD Conference 1994: 313-324 BibTeX
[CCB95]
Charles L. A. Clarke, Gordon V. Cormack, Forbes J. Burkowski: An Algebra for Structured Text Search and a Framework for its Implementation. Comput. J. 38(1): 43-56(1995) BibTeX
[CH90]
...
[CM94]
Mariano P. Consens, Tova Milo: Optimizing Queries on Files. SIGMOD Conference 1994: 301-312 BibTeX
[Ehr61]
...
[GNOT92]
David Goldberg, David A. Nichols, Brian M. Oki, Douglas B. Terry: Using Collaborative Filtering to Weave an Information Tapestry. Commun. ACM 35(12): 61-70(1992) BibTeX
[Gon87]
...
[GT87]
Gaston H. Gonnet, Frank Wm. Tompa: Mind Your Grammar: a New Approach to Modelling Text. VLDB 1987: 339-346 BibTeX
[GZC89]
Ralf Hartmut Güting, Roberto Zicari, David M. Choy: An Algebra for Structured Office Documents. ACM Trans. Inf. Syst. 7(2): 123-157(1989) BibTeX
[IK89]
Neil Immerman, Dexter Kozen: Definability with Bounded Number of Bound Variables. Inf. Comput. 83(2): 121-139(1989) BibTeX
[NBY95]
Gonzalo Navarro, Ricardo A. Baeza-Yates: A Language for Queries on Structure and Contents of Textual. SIGIR 1995: 93-101 BibTeX
[Ope93]
...
[Pae93]
Andreas Paepcke: An Object-Oriented View Onto Public, Heterogeneous Text Databases. ICDE 1993: 484-493 BibTeX
[PS82]
Christos H. Papadimitriou, Kenneth Steiglitz: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall 1982, ISBN 0-13-152462-3
BibTeX
[Rab69]
...
[SLS+93]
Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, Joachim Thomas II: The Rufus System: Information Organization for Semi-Structured Data. VLDB 1993: 97-107 BibTeX
[ST92]
...
[Tho90]
...
[Ull88a]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume I. Computer Science Press 1988, ISBN 0-7167-8158-1
Contents BibTeX
[Ull88b]
Jeffrey D. Ullman: Principles of Database and Knowledge-Base Systems, Volume II. Computer Science Press 1989, ISBN 0-7167-8162-X
Contents BibTeX

Referenced by

  1. Serge Abiteboul: Querying Semi-Structured Data. ICDT 1997: 1-18
  2. Atsuyuki Morishima, Hiroyuki Kitagawa: A Data Modelling and Query Processing Scheme for Integration of Structured Document Repositories and Relational Databases. DASFAA 1997: 145-154
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:34:12 2009