|




















|
|
 |
|
 |
Extracting Large-Scale Knowledge Bases from the Web
|
S. Ravi Kumar,
Prabhakar Raghavan,
Sridhar Rajagopalan, and
Andrew Tomkins
View Paper (PDF)
Return to Databases and the Web
The subject of this paper is the creation of knowledge bases by enumerating and organizing all web occurrences of certain subgraphs. We focus on subgraphs that are signatures of web phenomena such as tightly-focused topic communities, webrings, taxonomy trees, keiretsus, etc. For instance, the signature of a webring is a central page with bidirectional links to a number of other pages. We develop novel algorithms for such enumeration problems. A key technical contribution is the development of a model for the evolution of the web graph, based on experimental observations derived from a snapshot of the web. We argue that our algorithms run efficiently in this model, and use the model to explain some statistical phenomena on the web that emerged during our experiments. Finally, we describe the design and implementation of Campfire, a knowledge base of over one hundred thousand web communities.
Note: References link to DBLP on the Web.
-
[1]
-
Rakesh Agrawal
,
Ramakrishnan Srikant
: Fast Algorithms for Mining Association Rules in Large Databases.
VLDB 1994
: 487-499
-
[2]
-
...
-
[3]
-
...
-
[4]
-
Krishna Bharat
,
Andrei Z. Broder
,
Monika Rauch Henzinger
,
Puneet Kumar
,
Suresh Venkatasubramanian
: The Connectivity Server: Fast Access to Linkage Information on the Web.
WWW7 / Computer Networks 30(1-7)
: 469-477(1998)
-
[5]
-
Krishna Bharat
,
Monika Rauch Henzinger
: Improved Algorithms for Topic Distillation in a Hyperlinked Environment.
SIGIR 1998
: 104-111
-
[6]
-
Sergey Brin
,
Lawrence Page
: The Anatomy of a Large-Scale Hypertextual Web Search Engine.
WWW7 / Computer Networks 30(1-7)
: 107-117(1998)
-
[7]
-
...
-
[8]
-
...
-
[9]
-
Soumen Chakrabarti
,
Byron Dom
,
Prabhakar Raghavan
,
Sridhar Rajagopalan
,
David Gibson
,
Jon M. Kleinberg
: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text.
WWW7 / Computer Networks 30(1-7)
: 65-74(1998)
-
[10]
-
...
-
[11]
-
...
-
[12]
-
Jeffrey Dean
,
Monika Rauch Henzinger
: Finding Related Pages in the World Wide Web.
WWW8 / Computer Networks 31(11-16)
: 1467-1479(1999)
-
[13]
-
Daniela Florescu
,
Alon Y. Levy
,
Alberto O. Mendelzon
: Database Techniques for the World-Wide Web: A Survey.
SIGMOD Record 27(3)
: 59-74(1998)
-
[14]
-
...
-
[15]
-
...
-
[16]
-
...
-
[17]
-
...
-
[18]
-
...
-
[19]
-
Jon M. Kleinberg
: Authoritative Sources in a Hyperlinked Environment.
SODA 1998
: 668-677
-
[20]
-
...
-
[21]
-
Jon M. Kleinberg
,
S. Ravi Kumar
,
Prabhakar Raghavan
,
Sridhar Rajagopalan
,
Andrew Tomkins
: The Web as a Graph: Measurements, Models, and Methods.
COCOON 1999
: 1-17
-
[22]
-
...
-
[23]
-
...
-
[24]
-
...
-
[25]
-
Alberto O. Mendelzon
,
Peter T. Wood
: Finding Regular Simple Paths in Graph Databases.
SIAM J. Comput. 24(6)
: 1235-1258(1995)
-
[26]
-
...
-
[27]
-
Ehud Rivlin
,
Rodrigo A. Botafogo
,
Ben Shneiderman
: Navigating in Hyperspace: Designing a Structure-Based Toolbox.
CACM 37(2)
: 87-96(1994)
-
[28]
-
...
-
[29]
-
Shalom Tsur
,
Jeffrey D. Ullman
,
Serge Abiteboul
,
Chris Clifton
,
Rajeev Motwani
,
Svetlozar Nestorov
,
Arnon Rosenthal
: Query Flocks: A Generalization of Association-Rule Mining.
SIGMOD Conference 1998
: 1-12
-
[30]
-
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
@inproceedings{DBLP:conf/vldb/KumarRRT99,
author = {S. Ravi Kumar and
Prabhakar Raghavan and
Sridhar Rajagopalan and
Andrew Tomkins},
editor = {Malcolm P. Atkinson and
Maria E. Orlowska and
Patrick Valduriez and
Stanley B. Zdonik and
Michael L. Brodie},
title = {Extracting Large-Scale Knowledge Bases from the Web},
booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
UK},
publisher = {Morgan Kaufmann},
year = {1999},
isbn = {1-55860-615-5},
pages = {639-650},
crossref = {DBLP:conf/vldb/99},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
Copyright(C) 2000 ACM
|
|
|
|
|
|
|