|




















|
|
 |
|
 |
Automatic Discovery of Language Models for Text Databases
|
James P. Callan,
Margaret Connell, and
Aiqun Du
View Paper (PDF)
Return to Text and Web Databases
The proliferation of text databases within large organizations and on the Internet makes it diffcult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GlOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations. This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.
Note: References link to DBLP on the Web.
-
[1]
-
James P. Callan
,
Zhihong Lu
,
W. Bruce Croft
: Searching Distributed Collections with Inference Networks.
SIGIR 1995
: 21-28
-
[2]
-
Peter B. Danzig
,
Jongsuk Ahn
,
John Noll
,
Katia Obraczka
: Distributed Indexing: A Scalable Mechanism for Distributed Information Retrieval.
SIGIR 1991
: 220-229
-
[3]
-
Susan T. Dumais
: Latent Semantic Indexing (LSI) and TREC-2.
TREC 1993
: 105-116
-
[4]
-
James C. French
,
Allison L. Powell
,
Charles L. Viles
,
Travis Emmitt
,
Kevin J. Prey
: Evaluating Database Selection Techniques: A Testbed and Experiment.
SIGIR 1998
: 121-129
-
[5]
-
Luis Gravano
,
Chen-Chuan K. Chang
,
Hector Garcia-Molina
,
Andreas Paepcke
: STARTS: Stanford Proposal for Internet Meta-Searching (Experience Paper).
SIGMOD Conference 1997
: 207-218
-
[6]
-
Luis Gravano
,
Hector Garcia-Molina
: Generalizing GIOSS to Vector-Space Databases and Broker Hierarchies.
VLDB 1995
: 78-89
-
[7]
-
Luis Gravano
,
Hector Garcia-Molina
,
Anthony Tomasic
: The Effectiveness of GlOSS for the Text Database Discovery Problem.
SIGMOD Conference 1994
: 126-137
-
[8]
-
...
-
[9]
-
...
-
[10]
-
...
-
[11]
-
...
-
[12]
-
...
-
[13]
-
...
-
[14]
-
Ellen M. Voorhees
,
Narendra Kumar Gupta
,
Ben Johnson-Laird
: Learning Collection Fusion Strategies.
SIGIR 1995
: 172-179
-
[15]
-
Jinxi Xu
,
James P. Callan
: Effective Retrieval with Distributed Collections.
SIGIR 1998
: 112-120
-
[16]
-
George Kingsley Zipf: Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology. Addison-Wesley 1949
@inproceedings{DBLP:conf/sigmod/CallanCD99,
author = {James P. Callan and
Margaret Connell and
Aiqun Du},
editor = {Alex Delis and
Christos Faloutsos and
Shahram Ghandeharizadeh},
title = {Automatic Discovery of Language Models for Text Databases},
booktitle = {SIGMOD 1999, Proceedings ACM SIGMOD International Conference
on Management of Data, June 1-3, 1999, Philadephia, Pennsylvania,
USA},
publisher = {ACM Press},
year = {1999},
isbn = {1-58113-084-8},
pages = {479-490},
crossref = {DBLP:conf/sigmod/99},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
Copyright(C) 2000 ACM
|
|
|
|
|
|
|