We have developed a system to search complex patterns (on nucleic-acid or protein level) in large biosequence-databases. As patterns we allow hybrid patterns, which combine sequence similarity, structure similarity and arbitrary characteristics, like thermodynamic constraints. Applications are in the research of highly specific Protein/RNA-interactions or in the search of RNA-tertiary-structure-interactions. We develop a declarative pattern description language, which is implemented by known and new pattern-matching algorithms and an optimizing backtracking procedure. To achieve high efficiency when screening large data sets, the patterns are divided and queries are composed. The significance of patterns is precalculated and serves to optimize the search process. Complex results of queries are processed by a visualizing component. A library of biologically relevant patterns has been developed and it is provided on the WWW together with the search-tool. The evaluation of the tool w.r.t. the biosequence databases will in some cases mean to make laboratory-experiments, in order to check algorithmically developed functional hypothesis.

Institut fuer Physikalische Biologie, Heinrich-Heine Universitaet Duesseldorf



5 years

Deutsche Forschungsgesellschaft (DFG)

Fri Dec 19 10:54:56 CET 2014