@inproceedings{DBLP:conf/sigir/JacqueminR94, author = {Christian Jacquemin and Jean Royaut{\'e}}, editor = {W. Bruce Croft and C. J. van Rijsbergen}, title = {Retrieving Terms and their Variants in a Lexicalised Unification-Based Framework}, booktitle = {Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum)}, publisher = {ACM/Springer}, year = {1994}, isbn = {3-540-19889-X}, pages = {132-141}, ee = {db/conf/sigir/JacqueminR94.html}, crossref = {DBLP:conf/sigir/94}, bibsource = {DBLP, http://dblp.uni-trier.de} }BibTeX
Term extraction is a major concern for information retrieval. Terms are not fixed forms and their variations prevent them from being identified by a match with their initial string or inflection. We show that a local syntactic approach to this problem can give good results for both the quality of identification and parsing time.
A specific tool, FASTR, is developed which handles an identification of basic terms and a parser of their variations as well. Terms are described by logic rules automatically generated from terms and their categonal structure. Variations are represented by metarules. The parser efficiently processes large size corpora with big dictionaries and mixes lexical identification with local syntactic analysis. We evaluate the accuracy of results produced by these metarules and improve these results with filtering metandes.
Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.