|














|
|
 |
|
 |
|
Similarity Searching in Text Databases with Multiple Field Types
|
K. Tzeras and
E.G.M. Petrakis
View Paper (PDF)
Return to Poster Session 1: WWW, Integration, Workflow
We deal with the problem of similarity searching in text databases which are organized into multiple fields differing in content type and character length. We focus our attention on the ``COmmunity Research and Development Information Service'' (CORDIS) database of the European Union. We run exhaustive experiments on CORDIS and we evaluate the effectiveness of many text retrieval methods in terms of precision, recall and ranking quality. Our experiments suggest that digrams is the most effective method for search on proper names (e.g., surnames) while, for longer text fields (e.g, titles, abstracts etc.) cosine similarity methods are the most effective. Finally, we propose an indexing method for proper name search.
Copyright(C) 2000 ACM
|
|
|
|
|
|
|