Digital Symposium Collection 2000  

 
 
 
 
 
 

 





















Building Hierarchical Classifiers Using Class Proximity

Ke Wang, Senqiang Zhou, and Shiang Chen Liew

  View Paper (PDF)  

Return to Document Classification and Information Retrieval

Abstract
In this paper, we address the need to automatically classify text documents into topic hierarchies like those in ACM Digital Library and Yahoo!. The existing local approach constructs a classifier at each split of the topic hierarchy. However, the local approach does not address the closeness of classification in hierarchical classification where the concern often is how close a classification is, rather than simply correct or wrong. Also, the local approach puts its bet on classification at higher levels where the classification structure often diminishes. To address these issues, we propose the notion of class proximity and cast the hierarchical classification as a at classification with the class proximity modeling the closeness of classes. Our approach is global in that it constructs a single classifier based on the global information about all classes and class proximity. We leverage generalized association rules as the rule/feature space to address several other issues in hierarchical classification.


References

Note: References link to DBLP on the Web.

[AAK96]
Hussein Almuallim , Yasuhiro Akiba , Shigeo Kaneda : An Efficient Algorithm for Finding Optimal Gain-Ratio Multiple-Split Tests on Hierarchical Attributes in Decision Tree Learning. AAAI/IAAI, Vol. 1 1996 : 703-708
[AIS93]
Rakesh Agrawal , Tomasz Imielinski , Arun N. Swami : Mining Association Rules between Sets of Items in Large Databases. SIGMOD Conference 1993 : 207-216
[AS94]
Rakesh Agrawal , Ramakrishnan Srikant : Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994 : 487-499
[CDAR97-a]
Soumen Chakrabarti , Byron Dom , Rakesh Agrawal , Prabhakar Raghavan : Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases. VLDB 1997 : 446-455
[CDAR97-b]
Soumen Chakrabarti , Byron Dom , Rakesh Agrawal , Prabhakar Raghavan : Scalable Feature Selection, Classification and Signature Generation for Organizing Large Text Databases into Hierarchical Topic Taxonomies. VLDB Journal 7(3) : 163-178(1998)
[CDI98]
Soumen Chakrabarti , Byron Dom , Piotr Indyk : Enhanced Hypertext Categorization Using Hyperlinks. SIGMOD Conference 1998 : 307-318
[HF95]
Jiawei Han , Yongjian Fu : Discovery of Multiple-Level Association Rules from Large Databases. VLDB 1995 : 420-431
[KS97]
...
[LHM98]
Bing Liu , Wynne Hsu , Yiming Ma : Integrating Classification and Association Rule Mining. KDD 1998 : 80-86
[WZL99]
...
[Q93]
J. Ross Quinlan : C4.5: Programs for Machine Learning. Morgan Kaufmann 1993, ISBN 1-55860-238-0
[SA95]
Ramakrishnan Srikant , Rakesh Agrawal : Mining Generalized Association Rules. VLDB 1995 : 407-419
[SG92]
Padhraic Smyth , Rodney M. Goodman : An Information Theoretic Approach to Rule Induction from Databases. TKDE 4(4) : 301-316(1992)
[SHP95]
Hinrich Schütze , David A. Hull , Jan O. Pedersen : A Comparison of Classifiers and Document Representations for the Routing Problem. SIGIR 1995 : 229-237
[SOM]
Self-organizing Map. http://www.cis.hut.fi/nnrc/nnrc-programs.html
[YP97]
...

BIBTEX

@inproceedings{DBLP:conf/vldb/WangZL99,
  author    = {Ke Wang and
                Senqiang Zhou and
                Shiang Chen Liew},
   editor    = {Malcolm P. Atkinson and
                Maria E. Orlowska and
                Patrick Valduriez and
                Stanley B. Zdonik and
                Michael L. Brodie},
   title     = {Building Hierarchical Classifiers Using Class Proximity},
   booktitle = {VLDB'99, Proceedings of 25th International Conference on Very
                Large Data Bases, September 7-10, 1999, Edinburgh, Scotland,
                UK},
   publisher = {Morgan Kaufmann},
   year      = {1999},
   isbn      = {1-55860-615-5},
   pages     = {363-374},
   crossref  = {DBLP:conf/vldb/99},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },


























Copyright(C) 2000 ACM