@inproceedings{DBLP:conf/sigir/Egghe90, author = {Leo Egghe}, editor = {Jean-Luc Vidick}, title = {A New Method for Information Retrieval, Based on the Theory of Relative Concentration}, booktitle = {SIGIR'90, 13th International Conference on Research and Development in Information Retrieval, Brussels, Belgium, 5-7 September 1990, Proceedings}, publisher = {ACM}, year = {1990}, isbn = {0-89791-408-2}, pages = {469-493}, ee = {db/conf/sigir/Egghe90.html}, crossref = {DBLP:conf/sigir/90}, bibsource = {DBLP, http://dblp.uni-trier.de} }BibTeX
This paper introduces a new method for information retrieval of documents that are represented by a vector. The novelty of the algorithm lies in the fact that no (generalized) p-norms are used as a matching function between the query and the document (as is done e.g. by Salton and others) but a function that measures the relative dispersion of the terms between a document and a query. This function originates from an earlier paper of the author where a good measure of relative concentration was introduced, used in informetrics to measure the degree of specialization of a journal w.r.t. the entire subject.
This new information retrieval algorithm is shown to have many desirable properties (in the sense of the new Cater-Kraft wish list) including those of the original cosine-matching function of Salton. In addition the property of the cosine-matching function that, if one only uses weights 0 to 1, one is reduced to Boolean IR, is refined in the sense that one takes into consideration the broadness or specialization of a document and a query. Our new matching function satisfies these additional properties.
Copyright © 1990 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.