With more and more genomes becoming available, many new questions regarding similarities and differences between the sequenced organisms in multiple genome comparisons arise. For example, immediate questions are: Which genes are present in organism A and B but not in a third organism C? Which genes are specific for a certain species? To answer questions of this kind in a highly automated way, we developed the Genlight system.
Our system is based on a scalable, distributed client-server approach which allows to bundle the compute power of normal workstations and build a virtual cluster system for high throughput analysis tasks. Our server application automatically splits a comparison job into several sub-jobs and distributes them to the available client nodes. All results calculated by the clients are stored in the object relational database system PostgreSQL and, therefore can easily be accessed by a sophisticated web-interface or from external applications via a C API. The system can deal with nucleotide- as well as with protein-sequences. We have integrated almost all programs of the BLAST family (including PSI-BLAST) as well as FASTA and Smith-Waterman to perform sequence comparisons. For further analysis of the generated data, Genlight supports advanced data mining capabilities, like flexible filtering over multiple genomes. It is possible to combine single filters or to use them as an inclusion/exclusion criteria in a multiple genome comparison task. All (temporary) results can be persistently stored in the users own project workspace. We implemented an easy to use project management to organize different project workspaces of different users. All calculated or inferred results are usable for additional analyses like the integrated search for Pfam/TIGRFAM/SMART-hits (hmmpfam), CDD hits (rpsblast) or SCOP-domains. For screenings vs. the PRINTS and BLOCKS databases, Genlight makes use of the possumsearch program, which is based on a new efficient algorithm for the searching of position specific scoring matrices. Genlight allows to generate new data according to the specification given by the user. This is in contrast to other systems in this field which pre compute data and present this data in a static way. In Genlight the user can set up her/his own comparison jobs, design filters depending on her/his special needs or generate new data in a completely interactive way. With this approach a maximum of flexibility and transparency in the choice of parameters is guaranteed. The whole system, even the virtual cluster management, is completely accessible and controlled by an advanced and powerful web interface with dynamic data representation/visualization and references to external data sources.
All these features make Genlight a powerful general purpose system for a wide range of data rich tasks in (differential) comparative sequence analysis. We currently use Genlight to find potential new drug targets in parasites by target-specific screening. Another application strives to improve clustering criteria of Xenopus leavis ESTs and annotate assembled EST sequences.


Michael BeckstetteAlexander SczyrbaRobert Giegerich, Jens Mailaender, Richard Marhoefer, Paul M. Selzer

Juni 2001

4 years

1.6.2001-1.6.2003 Akzo Nobel/Intervet Innovation GmbH

