ACM SIGMOD Anthology SIGIR dblp.uni-trier.de

Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval.

Yiming Yang: Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. SIGIR 1994: 13-22
@inproceedings{DBLP:conf/sigir/Yang94,
  author    = {Yiming Yang},
  editor    = {W. Bruce Croft and
               C. J. van Rijsbergen},
  title     = {Expert Network: Effective and Efficient Learning from Human Decisions
               in Text Categorization and Retrieval},
  booktitle = {Proceedings of the 17th Annual International ACM-SIGIR Conference
               on Research and Development in Information Retrieval. Dublin,
               Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum)},
  publisher = {ACM/Springer},
  year      = {1994},
  isbn      = {3-540-19889-X},
  pages     = {13-22},
  ee        = {db/conf/sigir/Yang94.html},
  crossref  = {DBLP:conf/sigir/94},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}
BibTeX

Abstract

Expert Network (ExpNet) is our new approach to automatic categorization and retrieval of natural language texts. We use a training set of texts with expert-assigned categories to construct a network which approximately reflects the conditional probabilities of categories given a text. The input nodes of the network are words in the training texts, the nodes on the intermediate level are the training texts, and the output nodes are categories. The links between nodes are computed based on statistics of the word distribution and the category distribution over the training set, ExpNet is used for relevance ranking of candidate categories of an arbitrary text in the case of text categorization, and for relevance ranking of documents via categories in the case of text retrieval. We have evaluated ExpNet in categorization and retrieval on a document collection of the MEDLINE database, and observed a performance in recall and precision comparable to the Linear Leaat Squares Fit (LLSF) mapping method, and significantly better than other methods tested. Computationally, ExpNet has an O(N log N) time complexity which is much more efficient than the cubic complexity of the LLSF method. The simplicity of the model, the high recall-precision rates, and the efficient computation together make EzpNet preferable as a practical solution for real-world applications.

Copyright © 1994 by the ACM, Inc., used by permission. Permission to make digital or hard copies is granted provided that copies are not made or distributed for profit or direct commercial advantage, and that copies show this notice on the first page or initial screen of a display along with the full citation.


ACM SIGMOD Anthology

CDROM Version: Load the CDROM "Volume 2 Issue 3, SIGIR, DASFAA'97, OODBS'86" and ... DVD Version: Load ACM SIGMOD Anthology DVD 1" and ... BibTeX

Printed Edition

W. Bruce Croft, C. J. van Rijsbergen (Eds.): Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland, 3-6 July 1994 (Special Issue of the SIGIR Forum). ACM/Springer 1994, ISBN 3-540-19889-X
Contents BibTeX

Online Edition: ACM Digital Library

Citation page
BibTeX
ACM SIGMOD Anthology - DBLP: [Home | Search: Author, Title | Conferences | Journals]
ACM SIGMOD Anthology: Copyright © by ACM (info@acm.org), Corrections: anthology@acm.org
DBLP: Copyright © by Michael Ley (ley@uni-trier.de), last change: Sat May 16 23:38:45 2009