Automatic Discovery of Language Models for Text Databases
Jamie Callan* |       | Margie Connell |       | Aiqun Du |
University of Massachusetts |       | University of Massachusetts |       | University of Massachusetts |
callan@cs.umass.edu |       | connell@cs.umass.edu |       | adu@cs.umass.edu |
This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.