Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















A Layered Architecture for Querying Dynamic Web Content

Hasan Davulcu, Juliana Freire, Michael Kifer, and I. V. Ramakrishnan

  View Paper (PDF)  

Return to Text and Web Databases

Abstract
The design of webbases, database systems for supporting Web-based applications, is currently an active area of research. In this paper, we propose a 3-layer architecture for designing and implementing webbases for querying dynamic Web content (i.e., data that can only be extracted by filling out multiple forms). The lowest layer, virtual physical layer, provides navigation independence by shielding the user from the complexities associated with retrieving data from raw Web sources. Next, the traditional logical layer supports site independence. The top layer is analogous to the external schema layer in traditional databases. Within this architectural framework we address two problems unique to webbases — retrieving dynamic Web content in the virtual physical layer and querying of the external schema by the end user. The layered architecture makes it possible to automate data extraction to a much greater degree than in existing proposals. Wrappers for the virtual physical schema can be created semi-automatically, by asking the webbase designer to navigate through the sites of interest — we call this approach mapping by example. Thus, the webbase designer need not have expertise in the language that maps the physical schema to the raw Web (this should be contrasted to other approaches, which require expertise in various Web-enabled flavors of SQL). For the external schema layer, we propose a semantic extension of the universal relation interface. This interface provides powerful, yet reasonably simple, ad hoc querying capabilities for the end user compared to the currently prevailing “canned” form-based interfaces on the one hand or complex Web-enabling extensions of SQL on the other. Finally, we discuss the implementation of the proposed architecture.


References

Note: References link to DBLP on the Web.

[1]
Sibel Adali , K. Selçuk Candan , Yannis Papakonstantinou , V. S. Subrahmanian : Query Caching and Optimization in Distributed Mediator Systems. SIGMOD Conf. 1996 : 137-148
[2]
José Luis Ambite , Naveen Ashish , Greg Barish , Craig A. Knoblock , Steven Minton , Pragnesh J. Modi , Ion Muslea , Andrew Philpot , Sheila Tejada : ARIADNE: A System for Constructing Mediators for Internet Sources. SIGMOD Conference 1998 : 561-563
[3]
...
[4]
Paolo Atzeni , Giansalvatore Mecca , Paolo Merialdo : Semistructured und Structured Data in the Web: Going Back and Forth. SIGMOD Record 26(4) : 16-23(1997)
[5]
Paolo Atzeni , Giansalvatore Mecca , Paolo Merialdo : To Weave the Web. VLDB 1997 : 206-215
[6]
Anthony J. Bonner , Michael Kifer : An Overview of Transaction Logic. TCS 133(2) : 205-265(1994)
[7]
Olivier M. Duschka , Alon Y. Levy : Recursive Plans for Information Gathering. IJCAI (1) 1997 : 778-784
[8]
Daniela Florescu , Alon Y. Levy , Alberto O. Mendelzon : Database Techniques for the World-Wide Web: A Survey. SIGMOD Record 27(3) : 59-74(1998)
[9]
Jürgen Frohn , Rainer Himmeröder , Paul-Thomas Kandzia , Georg Lausen , Christian Schlepphorst : FLORID - Ein Prototyp fuer F-Logik. BTW 1997 : 100-117
[10]
Hector Garcia-Molina , Yannis Papakonstantinou , Dallan Quass , Anand Rajaraman , Yehoshua Sagiv , Jeffrey D. Ullman , Vasilis Vassalos , Jennifer Widom : The TSIMMIS Approach to Mediation: Data Models and Languages. JIIS 8(2) : 117-132(1997)
[11]
...
[12]
Michael Kifer : Deductive and Object Data Languages: A Quest for Integration. DOOD 1995 : 187-212
[13]
Michael Kifer , Won Kim , Yehoshua Sagiv : Querying Object-Oriented Databases. SIGMOD Conference 1992 : 393-402
[14]
Michael Kifer , Georg Lausen , James Wu : Logical Foundations of Object-Oriented and Frame-Based Languages. JACM 42(4) : 741-843(1995)
[15]
Craig A. Knoblock , Steven Minton , José Luis Ambite , Naveen Ashish , Pragnesh J. Modi , Ion Muslea , Andrew Philpot , Sheila Tejada : Modeling Web Sources for Information Integration. AAAI/IAAI 1998 : 211-218
[16]
David Konopnicki , Oded Shmueli : W3QS: A Query System for the World-Wide Web. VLDB 1995 : 54-65
[17]
Laks V. S. Lakshmanan , Fereidoon Sadri , Iyer N. Subramanian : A Declarative Language for Querying and Restructuring the WEB. RIDE-NDS 1996 : 12-21
[18]
...
[19]
Alon Y. Levy , Anand Rajaraman , Joann J. Ordille : Query-Answering Algorithms for Information Agents. AAAI/IAAI, Vol. 1 1996 : 40-47
[20]
Alon Y. Levy , Anand Rajaraman , Joann J. Ordille : Querying Heterogeneous Information Sources Using Source Descriptions. VLDB 1996 : 251-262
[21]
David Maier : The Theory of Relational Databases. Computer Science Press 1983, ISBN 0-914894-42-0
Contents
[22]
David Maier , David Rozenshtein , David Scott Warren : Windows on the World. SIGMOD Conference 1983 : 68-78
[23]
David Maier , Jeffrey D. Ullman : Maximal Objects and the Semantics of Universal Relation Databases. TODS 8(1) : 1-14(1983)
[24]
David Maier , Jeffrey D. Ullman , Moshe Y. Vardi : On the Foundations of the Universal Relation Model. TODS 9(2) : 283-308(1984)
[25]
Giansalvatore Mecca , Paolo Atzeni , Alessandro Masci , Paolo Merialdo , Giuseppe Sindoni : The Araneus Web-Base Management System. SIGMOD Conference 1998 : 544-546
[26]
...
[27]
Alberto O. Mendelzon , George A. Mihaila , Tova Milo : Querying the World Wide Web. Int. J. on Digital Libraries 1(1) : 54-67(1997)
[28]
...
[29]
Anand Rajaraman , Yehoshua Sagiv , Jeffrey D. Ullman : Answering Queries Using Templates with Binding Patterns. PODS 1995 : 105-112
[30]
Mary Tork Roth , Manish Arya , Laura M. Haas , Michael J. Carey , William F. Cody , Ronald Fagin , Peter M. Schwarz , Joachim Thomas II , Edward L. Wimmers : The Garlic Project. SIGMOD Conf. 1996 : 557
[31]
...

BIBTEX

@inproceedings{DBLP:conf/sigmod/DavulcuFKR99,
  author    = {Hasan Davulcu and
                Juliana Freire and
                Michael Kifer and
                I. V. Ramakrishnan},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {A Layered Architecture for Querying Dynamic Web Content},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {491-502},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM