Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















An XML-based Wrapper Generator for Web Information Extraction

Ling Liu, Wei Han, David Buttler, Calton Pu, and Wei Tang

  View Paper (PDF)     View Demo (HTML)  

Return to Demonstrations

Abstract
There has been tremendous interest in information integration systems that automatically gather, manipulate, and integrate data from multiple information sources on a user's behalf. Unfortunately, web sites are primarily designed for human browsing rather than for use by a computer program. Mechanically extracting their content is in general a rather difficult job if not impossible [4]. Software systems using such web information sources typically use hand-coded wrappers to extract information content of interest from web sources and translate query responses to a more structured format (e.g., relational form) before unifying them into an integrated answer to a user's query. The most recent generation of information mediator systems (e.g., Ariadne [3], CQ [5, 7], Internet Softbots [4], TSIMMIS [2]) addresses this problem by enabling a pre-wrapped set of web sources to be accessed via database-like queries.


References

Note: References link to DBLP on the Web.

[1]
...
[2]
Joachim Hammer , Hector Garcia-Molina , Svetlozar Nestorov , Ramana Yerneni , Markus M. Breunig , Vasilis Vassalos : Template-Based Wrappers in the TSIMMIS System. SIGMOD Conference 1997 : 532-535
[3]
Craig A. Knoblock , Steven Minton , José Luis Ambite , Naveen Ashish , Pragnesh J. Modi , Ion Muslea , Andrew Philpot , Sheila Tejada : Modeling Web Sources for Information Integration. AAAI/IAAI 1998 : 211-218
[4]
Nicholas Kushmerick , Daniel S. Weld , Robert B. Doorenbos : Wrapper Induction for Information Extraction. IJCAI (1) 1997 : 729-737
[5]
...
[6]
...
[7]
Ling Liu , Calton Pu , Wei Tang , David Buttler , John Biggs , Tong Zhou , Paul Benninghoff , Wei Han , Fenghua Yu : CQ: A Personalized Update Monitoring Toolkit. SIGMOD Conference 1998 : 547-549

BIBTEX

@inproceedings{DBLP:conf/sigmod/LiuHBPT99,
  author    = {Ling Liu and
                Wei Han and
                David Buttler and
                Calton Pu and
                Wei Tang},
   editor    = {Alex Delis and
                Christos Faloutsos and
                Shahram Ghandeharizadeh},
   title     = {An XML-based Wrapper Generator for Web Information Extraction},
   booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
                on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
                USA},
   publisher = {ACM Press},
   year      = {1999},
   isbn      = {1-58113-084-8},
   pages     = {540-543},
   crossref  = {DBLP:conf/sigmod/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM