Digital Symposium Collection 2000  

 
 
 
 
 
 

 















Similarity Searching in Text Databases with Multiple Field Types

K. Tzeras and E.G.M. Petrakis

  View Paper (PDF)  

Return to Poster Session 1: WWW, Integration, Workflow

Abstract

We deal with the problem of similarity searching in text databases which are organized into multiple fields differing in content type and character length. We focus our attention on the ``COmmunity Research and Development Information Service'' (CORDIS) database of the European Union. We run exhaustive experiments on CORDIS and we evaluate the effectiveness of many text retrieval methods in terms of precision, recall and ranking quality. Our experiments suggest that digrams is the most effective method for search on proper names (e.g., surnames) while, for longer text fields (e.g, titles, abstracts etc.) cosine similarity methods are the most effective. Finally, we propose an indexing method for proper name search.

























Copyright(C) 2000 ACM