Algebras for Querying Text Regions.
Mariano P. Consens, Tova Milo:
Algebras for Querying Text Regions.
PODS 1995: 11-22@inproceedings{DBLP:conf/pods/ConsensM95,
author = {Mariano P. Consens and
Tova Milo},
title = {Algebras for Querying Text Regions},
booktitle = {Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium
on Principles of Database Systems, May 22-25, 1995, San Jose,
California},
publisher = {ACM Press},
year = {1995},
isbn = {0-89791-730-8},
pages = {11-22},
ee = {http://doi.acm.org/10.1145/212433.212437, db/conf/pods/ConsensM95.html},
crossref = {DBLP:conf/pods/95},
bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX
Abstract
There is a significant amount of interest in combining and
extending database and information retrieval technologies to manage textual
data. The challenge is becoming more relevant due to the increased
availability of documents in digital form. Document data has a natural
hierarchical structure, which may be made explicit due to the use of
markup conventions (as it is the case with SGML). An important aspect
of managing structured and semi-structured textual data consists of
supporting the efficient retrieval of text components based both on their
content and structure.
In this paper we study issues related to the expressive power and
optimization of a class of algebras that support combining string (or
pattern) searches with queries on the hierarchical structure of the text.
The region algebra studied is a set-at-a-time algebra for manipulating
text regions (substrings of the text) that supports finding out
nesting and ordering properties of the text regions. The region algebra is
part of the language in use in commercial text retrieval systems, and can
be implemented very efficiently.
The results in this work are obtained by showing a close relationship
between the region algebra and the monadic first order theory of binary trees.
We show that queries in the algebra can be optimized,
but the optimization can be difficult (Co-NP-Hard in the general case,
although there is an important class of queries that can be optimized
in polynomial time).
On the negative side, we show that the language is incapable of capturing
some important properties of the text structure, related to the nesting
and ordering of text regions.
We conclude by suggesting possible extensions to increase the expressive
power of the language, focusing again on optimization and expressibility.
Copyright © 1995 by the ACM,
Inc., used by permission. Permission to make
digital or hard copies is granted provided that
copies are not made or distributed for profit or
direct commercial advantage, and that copies show
this notice on the first page or initial screen of
a display along with the full citation.
Load The ACM SIGMOD Anthology, CDROM Edition, Volume 1-3, PODS '82-'98.
and ...
Load The ACM SIGMOD Anthology, Silver Edition, DVD 1, Proceedings.
and ...
BibTeX
Printed Edition
Proceedings of the Fourteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, May 22-25, 1995, San Jose, California.
ACM Press 1995, ISBN 0-89791-730-8
Contents BibTeX
[Index Terms]
[Full Text in PDF Format, 1137 KB]
References
- [ACM93]
- Serge Abiteboul, Sophie Cluet, Tova Milo:
Querying and Updating the File.
VLDB 1993: 73-84 BibTeX
- [AFS93]
- Rakesh Agrawal, Christos Faloutsos, Arun N. Swami:
Efficient Similarity Search In Sequence Databases.
FODO 1993: 69-84 BibTeX
- [BGMM93]
- Daniel Barbará, Chris Clifton, Fred Douglis, Hector Garcia-Molina, Stephen Johnson, Ben Kao, Sharad Mehrotra, Jens Tellefsen, Rosemary Walsh:
The Gold Mailer.
ICDE 1993: 92-99 BibTeX
- [Bur92]
- Forbes J. Burkowski:
An Algebra for Hierarchically Organized Text-Dominate Databases.
Inf. Process. Manage. 28(3): 333-348(1992) BibTeX
- [CACS94]
- Vassilis Christophides, Serge Abiteboul, Sophie Cluet, Michel Scholl:
From Structured Documents to Novel Query Facilities.
SIGMOD Conference 1994: 313-324 BibTeX
- [CCB95]
- Charles L. A. Clarke, Gordon V. Cormack, Forbes J. Burkowski:
An Algebra for Structured Text Search and a Framework for its Implementation.
Comput. J. 38(1): 43-56(1995) BibTeX
- [CH90]
- ...
- [CM94]
- Mariano P. Consens, Tova Milo:
Optimizing Queries on Files.
SIGMOD Conference 1994: 301-312 BibTeX
- [Ehr61]
- ...
- [GNOT92]
- David Goldberg, David A. Nichols, Brian M. Oki, Douglas B. Terry:
Using Collaborative Filtering to Weave an Information Tapestry.
Commun. ACM 35(12): 61-70(1992) BibTeX
- [Gon87]
- ...
- [GT87]
- Gaston H. Gonnet, Frank Wm. Tompa:
Mind Your Grammar: a New Approach to Modelling Text.
VLDB 1987: 339-346 BibTeX
- [GZC89]
- Ralf Hartmut Güting, Roberto Zicari, David M. Choy:
An Algebra for Structured Office Documents.
ACM Trans. Inf. Syst. 7(2): 123-157(1989) BibTeX
- [IK89]
- Neil Immerman, Dexter Kozen:
Definability with Bounded Number of Bound Variables.
Inf. Comput. 83(2): 121-139(1989) BibTeX
- [NBY95]
- Gonzalo Navarro, Ricardo A. Baeza-Yates:
A Language for Queries on Structure and Contents of Textual.
SIGIR 1995: 93-101 BibTeX
- [Ope93]
- ...
- [Pae93]
- Andreas Paepcke:
An Object-Oriented View Onto Public, Heterogeneous Text Databases.
ICDE 1993: 484-493 BibTeX
- [PS82]
- Christos H. Papadimitriou, Kenneth Steiglitz:
Combinatorial Optimization: Algorithms and Complexity.
Prentice-Hall 1982, ISBN 0-13-152462-3
BibTeX
- [Rab69]
- ...
- [SLS+93]
- Kurt A. Shoens, Allen Luniewski, Peter M. Schwarz, James W. Stamos, Joachim Thomas II:
The Rufus System: Information Organization for Semi-Structured Data.
VLDB 1993: 97-107 BibTeX
- [ST92]
- ...
- [Tho90]
- ...
- [Ull88a]
- Jeffrey D. Ullman:
Principles of Database and Knowledge-Base Systems, Volume I.
Computer Science Press 1988, ISBN 0-7167-8158-1
Contents BibTeX
- [Ull88b]
- Jeffrey D. Ullman:
Principles of Database and Knowledge-Base Systems, Volume II.
Computer Science Press 1989, ISBN 0-7167-8162-X
Contents BibTeX
Referenced by
- Serge Abiteboul:
Querying Semi-Structured Data.
ICDT 1997: 1-18
- Atsuyuki Morishima, Hiroyuki Kitagawa:
A Data Modelling and Query Processing Scheme for Integration of Structured Document Repositories and Relational Databases.
DASFAA 1997: 145-154
BibTeX
ACM SIGMOD Anthology - DBLP:
[Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:34:12 2009