Programme

All talks will be held in lecture hall H6 at the western end of the main hall.
The time for a talk is 20 minutes plus 5 min discussion.

Bioinformatics applications

Wednesday, 2007-09-05 - from 10:30 to 12:15

  • Characterization of Genetic Signal Sequences with Batch-Learning SOM
    Presenting Author: Takashi Abe
    Authors: Takashi Abe, Shun Ikeda, Shigehiko Kanaya, Kennosuke Wada, and Toshimichi Ikemura
    Abstract:
    An unsupervised clustering algorithm Kohonen's SOM is an effective tool for clustering and visualizing high-dimensional complex data on a single map. We previously modified the conventional SOM for genome informatics, making the learning process and resulting map independent of the order of data input on the basis of Batch Learning SOM (BL-SOM). We generated BL-SOMs for tetra- and pentanucleotide frequencies in 300,000 10-kb sequences from 13 eukaryotes for which almost complete genomic sequences are available. BL-SOM recognized species-specific characteristics of oligonucleotide frequencies in most 10-kb sequences, permitting species-specific classification of sequences without any information regarding the species. We next constructed BL-SOMs with tetra- and pentanucleotide frequencies in 37,086 full-length mouse cDNA sequences. With BL-SOM we also analyzed occurrence patterns of the oligonucleotides that are thought to be involved in transcriptional regulation on the human genome.
  • Advanced metric adaptation in Generalized LVQ for classification of mass spectrometry data
    Presenting Author: Petra Schneider
    Authors: Petra Schneider, Michael Biehl, Frank-Michael Schleif, Barbara Hammer
    Abstract:
    Metric adaptation constitutes a powerful approach to improve the performance of prototype based classication schemes. We apply extensions of Generalized LVQ based on different adaptive distance measures in the domain of clinical proteomics. The Euclidean distance in GLVQ is extended by adaptive relevance vectors and matrices of global or local influence where training follows a stochastic gradient descent on an appropriate error function. We compare the performance of the resulting learning algorithms for the classification of high dimensional mass spectrometry data from cancer research. High prediction accuracies can be obtained by adapting full matrices of relevance factors in the distance measure in order to adjust the metric to the underlying data structure. The easy interpretability of the resulting models after training of relevance vectors allows to identify discriminative features in the original spectra.
  • Genome feature exploration using hyperbolic Self-Organising Maps
    Presenting Author: Christian Martin
    Authors: Christian Martin, Naryttza N. Diaz, Jörg Ontrup and Tim W. Nattkemper
    Abstract:
    The advent of sequencing technologies allows to reassess the relationship between species in the hierarchically organized tree of life. Self-Organizing Maps (SOM) in Euclidean and hyperbolic space are applied to genomic signatures of 350 different organisms of the two superkingdoms Bacteria and Archaea to link the sequence signature space to pre-defined taxonomic levels, i.e. the tree of life. In the hyperbolic space the SOMs are trained by either the standard algorithm (HSOM) or in a hierarchical manner (H²SOM). For evaluating the SOM performances, distances between organisms in the feature space, on the SOM grid and in the taxonomy tree are compared pair-wise. We show that the structure recovered using the different SOMs reflects the gold standard of current taxonomy. The distances between species are better preserved when using the HSOM or H²SOM which makes the hyperbolic space better suited for embedding the high dimensional genomic signatures.
  • SOM-based Peptide Prototyping for Mass Spectrometry Peak Intensity Prediction
    Presenting Author: Alexandra Scherbart
    Authors: Alexandra Scherbart, Wiebke Timm, Tim W. Nattkemper, Sebastian Böcker
    Abstract:
    In todays bioinformatics, Mass spectrometry (MS) is the key technique for the identification of proteins. A prediction of spectrum peak intensities from pre computed molecular features would pave the way to better understanding of spectrometry data and improved spectrum evaluation. We propose a neural network architecture of Local Linear Map (LLM)-type based on Self-Organizing Maps (SOMs) for peptide prototyping and learning locally tuned regression functions for peak intensity prediction in MALDI-TOF mass spectra. We obtain results comparable to those obtained by nu-Support Vector Regression and show how the SOM learning architecture provides a basis for peptide feature profiling and visualisation.