|




















|
|
 |
|
 |
An XML-based Wrapper Generator for Web Information Extraction
|
Ling Liu,
Wei Han,
David Buttler,
Calton Pu, and
Wei Tang
View Paper (PDF)
View Demo (HTML)
Return to Demonstrations
There has been tremendous interest in information integration systems that automatically gather, manipulate, and integrate data from multiple information sources on a user's behalf. Unfortunately, web sites are primarily designed for human browsing rather than for use by a computer program. Mechanically extracting their content is in general a rather difficult job if not impossible [4]. Software systems using such web information sources typically use hand-coded wrappers to extract information content of interest from web sources and translate query responses to a more structured format (e.g., relational form) before unifying them into an integrated answer to a user's query. The most recent generation of information mediator systems (e.g., Ariadne [3], CQ [5, 7], Internet Softbots [4], TSIMMIS [2]) addresses this problem by enabling a pre-wrapped set of web sources to be accessed via database-like queries.
Note: References link to DBLP on the Web.
-
[1]
-
...
-
[2]
-
Joachim Hammer
,
Hector Garcia-Molina
,
Svetlozar Nestorov
,
Ramana Yerneni
,
Markus M. Breunig
,
Vasilis Vassalos
: Template-Based Wrappers in the TSIMMIS System.
SIGMOD Conference 1997
: 532-535
-
[3]
-
Craig A. Knoblock
,
Steven Minton
,
José Luis Ambite
,
Naveen Ashish
,
Pragnesh J. Modi
,
Ion Muslea
,
Andrew Philpot
,
Sheila Tejada
: Modeling Web Sources for Information Integration.
AAAI/IAAI 1998
: 211-218
-
[4]
-
Nicholas Kushmerick
,
Daniel S. Weld
,
Robert B. Doorenbos
: Wrapper Induction for Information Extraction.
IJCAI (1) 1997
: 729-737
-
[5]
-
...
-
[6]
-
...
-
[7]
-
Ling Liu
,
Calton Pu
,
Wei Tang
,
David Buttler
,
John Biggs
,
Tong Zhou
,
Paul Benninghoff
,
Wei Han
,
Fenghua Yu
: CQ: A Personalized Update Monitoring Toolkit.
SIGMOD Conference 1998
: 547-549
@inproceedings{DBLP:conf/sigmod/LiuHBPT99,
author = {Ling Liu and
Wei Han and
David Buttler and
Calton Pu and
Wei Tang},
editor = {Alex Delis and
Christos Faloutsos and
Shahram Ghandeharizadeh},
title = {An XML-based Wrapper Generator for Web Information Extraction},
booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
USA},
publisher = {ACM Press},
year = {1999},
isbn = {1-58113-084-8},
pages = {540-543},
crossref = {DBLP:conf/sigmod/99},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
Copyright(C) 2000 ACM
|
|
|
|
|
|
|