ACM SIGMOD Anthology ACM SIGMOD dblp.uni-trier.de

WebView: A Tool for Retrieving Internal Structures and Extracting Information from HTML Documents.

Seung Jin Lim, Yiu-Kai Ng: WebView: A Tool for Retrieving Internal Structures and Extracting Information from HTML Documents. DASFAA 1999: 71-80
@inproceedings{DBLP:conf/dasfaa/LimN99,
  author    = {Seung Jin Lim and
               Yiu-Kai Ng},
  editor    = {Arbee L. P. Chen and
               Frederick H. Lochovsky},
  title     = {WebView: A Tool for Retrieving Internal Structures and Extracting
               Information from HTML Documents},
  booktitle = {Database Systems for Advanced Applications, Proceedings of the
               Sixth International Conference on Database Systems for Advanced
               Applications (DASFAA), April 19-21, Hsinchu, Taiwan},
  publisher = {IEEE Computer Society},
  year      = {1999},
  isbn      = {0-7695-0084-6},
  pages     = {71-80},
  ee        = {db/conf/dasfaa/LimN99.html},
  crossref  = {DBLP:conf/dasfaa/99},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

HTML [Rag96,Sei96] is a well-accepted and widely used language for creating platform-independent documents to be posted on the Web, and HTML documents are semistructured in nature according to the HTML specification. We propose a tool, called WebView, which constructs the semistructured data graph (SDG) of an HTML document H to capture the internal structure of data embedded in H and its (in)directly linked documents. On top of the SDG, WebView provides query processing capability for evaluating SQL-like queries that are posted against the SDG, i.e., the source document(s), for extracting information from the SDG. Existing methods for extracting structured information from certain HTML documents with static internal structure, such as wrappers and integrators for data warehousing, can benefit from WebView.

Copyright © 1999 by The Institute of Electrical and Electronic Engineers, Inc. (IEEE). Abstract used with permission.


ACM SIGMOD DiSC

CDROM Version: Load the CDROM "DiSC, Volume 2 Number 1" and ...

ACM SIGMOD Anthology

DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Online Edition: IEEE Computer Society Digital Library

Citation Page

References

[1]
Serge Abiteboul: Querying Semi-Structured Data. ICDT 1997: 1-18 BibTeX
[2]
Serge Abiteboul, Sophie Cluet, Vassilis Christophides, Tova Milo, Guido Moerkotte, Jérôme Siméon: Querying Documents in Object Databases. Int. J. on Digital Libraries 1(1): 5-19(1997) BibTeX
[3]
Gustavo O. Arocena, Alberto O. Mendelzon: WebOQL: Restructuring Documents, Databases, and Webs. ICDE 1998: 24-33 BibTeX
[4]
Paolo Atzeni, Giansalvatore Mecca: Cut & Paste. PODS 1997: 144-153 BibTeX
[5]
Paolo Atzeni, Giansalvatore Mecca, Paolo Merialdo: To Weave the Web. VLDB 1997: 206-215 BibTeX
[6]
...
[7]
...
[8]
David Konopnicki, Oded Shmueli: W3QS: A Query System for the World-Wide Web. VLDB 1995: 54-65 BibTeX
[9]
Laks V. S. Lakshmanan, Fereidoon Sadri, Iyer N. Subramanian: A Declarative Language for Querying and Restructuring the WEB. RIDE-NDS 1996: 12-21 BibTeX
[10]
Alberto O. Mendelzon, Tova Milo: Formal Models of Web Queries. PODS 1997: 134-143 BibTeX
[11]
...
[12]
...
[13]
Jennifer Widom: Research Problems in Data Warehousing. CIKM 1995: 25-30 BibTeX
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
DASFAA 1999 Proceedings: Copyright © by IEEE,
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:05:36 2009