This is perhaps not strictly a database paper, but it has been so influential in the web searching community that I think it deserves reviewing here.
The question addressed is: when is a web page p returned by a search engine on topic T a good source of information on topic T? In other words, when is p an authoritative source for topic T? Let's say that such a p is an authority on T. Authorities are defined by mutual recursion with hubs: a hub for T is a page that points to authorities for T, and a page is an authority for T if hub pages for T point to it. This leads to a natural iterative method for computing hubs and authorities, starting with some subset of the answers returned by a search engine and some pages reachable from these. This iteration turns out to amount to computing the eigenvectors of certain matrices obtained from the link graph, leading to non-trivial mathematics and to elegant and efficient algorithms.
The paper also contains extensive empirical evaluation of the method on a variety of interesting queries.
The idea of mining link structure for useful hints on the semantics of web documents has led to a great deal of work in the search engine community, as can be seen by glancing at the WWW7 and WWW8 Proceedings, among others, or by using the Google engine, which incorporates page rank, a somewhat simpler way of using link structure to compute page reputations (Brin and Page, WWW7). A tutorial on these ideas was given by Kleinberg and Andrew Tomkins at PODS'99 [2].
Copyright © 2000 by the author(s). Review published with permission.