![]() |
![]() |
![]() |
@inproceedings{DBLP:conf/dasfaa/LamL99, author = {Savio L. Y. Lam and Dik Lun Lee}, editor = {Arbee L. P. Chen and Frederick H. Lochovsky}, title = {Feature Reduction for Neural Network Based Text Categorization}, booktitle = {Database Systems for Advanced Applications, Proceedings of the Sixth International Conference on Database Systems for Advanced Applications (DASFAA), April 19-21, Hsinchu, Taiwan}, publisher = {IEEE Computer Society}, year = {1999}, isbn = {0-7695-0084-6}, pages = {195-202}, ee = {db/conf/dasfaa/LamL99.html}, crossref = {DBLP:conf/dasfaa/99}, bibsource = {DBLP, http://dblp.uni-trier.de} }BibTeX
In a text categorization model using an artificial neural network as the text classifier, scalability is poor if the neural network is trained using the raw feature space since textural data has a very high-dimension feature space.
We proposed and compared four dimensionality reduction techniques to reduce the feature space into an input space of much lower dimension for the neural network classifier. To test the effectiveness of the proposed model, experiments were conducted using a subset of the Reuters-22173 test collection for text categorization.
The results showed that the proposed model was able to achieve high categorization effectiveness as measured by precision and recall. Among the four dimensionality reduction techniques proposed, Principal Component Analysis was found to be the most effective in reducing the dimensionality of the feature space.
Copyright © 1999 by The Institute of Electrical and Electronic Engineers, Inc. (IEEE). Abstract used with permission.