Programme

All talks will be held in lecture hall H6 at the western end of the main hall.
The time for a talk is 20 minutes plus 5 min discussion.

Text and Document Analysis

Monday, 2007-09-03 - from 14:30 to 15:45

  • Self-Organizing Word Map for Context-Based Document Classification
    Presenting Author: Nikolaos Tsimboukakis
    Authors: Nikolaos Tsimboukakis and George Tambouratzis
    Abstract:
    In this paper, a novel SOM-based system for document organization is presented. The purpose of the system is the classification of a document collection in terms of document content. The system possesses a two-level hybrid connectionist architecture that comprises (i) an automatically created word map using a SOM, which functions as a feature extraction module and (ii) a supervised MLP-based classifier, which provides the final classification result. The experiments, which have been performed on Modern Greek text documents, indicate that the proposed system separates effectively the different types of text.
  • Self-Organized Ordering of Terms and Documents in NSF Awards Data
    Presenting Author: Mikaela Klami
    Authors: Mikaela Klami, Timo Honkela
    Abstract:
    We present the results of an analysis of a text corpus of 129,000 abstracts of NSF-sponsored basic research projects between years 1990 and 2003. The methods used in the analysis include term extraction based on a reference corpus and an entropy measure, and the Self-Organizing Map algorithm for the formation of a term map and a document map. Methodologically, the basic approach is based on earlier developments, such as word category maps and the WEBSOM method, but in the level of details, we report several new aspects and quantitative comparison results between methodological variants in this article. The data covers a quite large proportion of US-based scientific research during recent years. The analysis results indicate the basic patterns discernable in the data, both at the level of the awards and at the terminology used in them.
  • Dimensionality Reduction of very large document collections by Semantic Mapping
    Presenting Author: Renato Fernandes Corrêa
    Authors: Renato Fernandes Corrêa, Teresa Bernarda Ludermir
    Abstract:
    This paper describes improving in Semantic Mapping, a feature extraction method useful to dimensionality reduction of vectors representing documents of large text collections. This method may be viewed as a specialization of the Random Mapping, method proposed in WEBSOM project. Semantic Mapping, Random Mapping and Principal Component Analysis (PCA) are applied to categorization of document collections using Self-Organizing Maps (SOM). Semantic Mapping generated document representation as good as PCA and much better than Random Mapping.