|
|
|
|
|
|
An Interactive Classification of Web Documents by Self-Organizing Maps and Search Engines
|
Kenji Hatano,
Ryouichi Sano,
Yiwei Duan, and
Katsumi Tanaka
View Paper (PDF)
Return to Session 1A: World Wide Web
In this paper, we propose an effective classification view
mechanism for hypertext data such as web documents based on Kohonen's Self-Organizing
Map (SOM) and search engines. Web documents collected by search engines are
automatically classified by SOM and the obtained SOMs are incrementally
modified according to the interaction between users and SOMs. At present,
various search engines are designed to retrieve web documents. When we use
search engines to retrieve web documents, we get a lot of answers as ever before,
so we have a lot of labors to examine each web document. Therefore, in order to
make up for search engines, we need a function to classify web document
corresponding to the user's point of view and their purposes. Furthermore, we
cannot retrieve pertinent web documents by conventional search engines when a
specific topic is described by more than one web document. To solve these
problems, we exploited a content-based clustering system for web documents. In
this system, web documents are automatically clustered by their feature vectors
produced from web documents or minimal subgraphs consisting of multiple web
documents, and their overview maps are dynamically generated by SOM.
Furthermore, we propose a method by which an obtained SOM is modified by user's interaction
such as feedback operations. It is important to reflect the aim of classification
and the purpose of retrieval to this system. In our research, we intend to
solve these problems by providing a view mechanism in which the Basic Units for
retrieval and clustering of Web Documents (BUWDs) are changeable by users and
relevance feedback operations enable the generation of an overview map which
reflects user needs.
Copyright(C) 2000 ACM
|
|
|
|
|
|
|