Algorithms

The research area algorithms deals with the development and adaptation of artificial intelligence methods for questions in the field of life sciences. and existing methods are adapted to address relevant biomedical issues with methods of data mining and artificial intelligence.

Involved Research Groups

Prof. Dr. Lars Kaderali

Institut für Bioinformatik
Universitätsmedizin Greifswald
AG Kaderali

Prof. Dr. Stefan Simm

Institut für Bioinformatik
Universitätsmedizin Greifswald
AG Simm

Prof. Dr. Mario Stanke

Institut für Mathematik und Informatik
Universität Greifswald
AG Stanke

Prof. Dr. Joscha Diehl

Juniorprofessur für Stochastische Analysis
Universität Greifswald 

Dr. Markus Becker

Arbeitsgruppe Dr. Markus Becker
Leibniz-Institut für Plasmaforschung und Technologie e.V.

Projects

  • Pre-processing of data: Biomedical data often have special characteristics that make the application of AI methods difficult: High variance, very heterogeneous distributions, missing values, mixture of discrete and continuous data, small number of cases, large number of features. This project deals with methods that aim at the pre-processing of data as a basis for further AI applications.
  • Methods for regularisation: Applications in medicine often suffer from small case numbers compared to other AI use cases, with simultaneously complex and high-dimensional feature spaces. The methods developed in this project and the integration of biological and medical prior knowledge are therefore important prerequisites for many AI algorithms.
  • Hierarchical Bayesian networks: Explainability of AI results is an important prerequisite for their successful translation into application, especially in the case of medical questions. Bayesian networks and graphical models are therefore excellently suited methods for learning and visualising correlations between risk factors and to manifest disease from data. This project deals with the development of new methods for the learning of hierarchical Bayesian networks, with diverse applications in medicine and epidemiology.
  • Deep Learning: Deep neural networks (Deep Learning methods) have grown more and more important in recent years, with the development of diverse applications, especially in image processing. In this project, we are further developing Deep Learning methods for the analysis of complex clinical and molecular data sets with a particular focus on the integration of heterogeneous data, and the integration of prior biological knowledge plays a central role. Applications include the identification of key molecular mechanisms in ageing processes and age-associated diseases.
  • Whole-Genome-Regression: Inherited phenotypical traits such as body size are largely determined by the genome. Such traits, in complex cases with correlation between many genotypical traits such as so-called single nucleotide polymorphisms (SNPs), can often be estimated much worse from the genome using current linear regression methods than the heritability of the trait would suggest. In this project, customised non-linear machine learning methods for regression are being developed and tested on a cohort of about half a million subjects.
  • Discriminatively trained models of molecular evolution: The model class of choice for modelling evolving sequences are continuous-time Markov chains parameterised by rate matrices. New methods are being developed to estimate these rate matrices in such a way that the model can distinguish different classes as well as possible based on the differences between sequences. Applications are the identification of new protein-coding sequences in genomes and the finding of spatially adjacent residues of proteins that are only available as sequences (contact map).
  • Genome annotation: describes the finding of genes and, in the case of eukaryotes, their exon-intron structure in genomes. To reduce the number of errors, we are further developing the AUGUSTUS programme using machine learning methods, among other things. One project is the integration of heterogeneous evidence sources and several competing annotations into an annotation that is as accurate as possible, taking alternative splicing into account.
  • Multiple sequence alignment: In this project, the well-known problem of identifying the corresponding positions in a set of sequences that are assumed to be related is taken up anew and conceived as a machine learning problem. So-called graph neural networks are used here. Here, tensors are associated with the nodes and edges of a graph and recalculated iteratively and by locally sending “messages”, for example to classify edges or nodes or the graph.
  • Reinforcement learning für an epidemiological strategy for interventions: The random spread of a disease through contagion can be fought through measurements (e.g. testing, reducing the probabilities of infection). The measurements from a given catalogue of interventions may be associated with “costs” and may be limited to subpopulations that may be locally defined, for example. When it comes to optimisation their timing can be a difficult problem. In this new project, a strategy for intervention is to be optimised based on a simulation model for the spread of a pathogen using reinforcement learning.
  • Image analysis: In various individual projects, convolutional neural networks are trained for two- and three-dimensional image data (photos, microscopy, fMRI). It is possible to classify and segment the images or to identify objects on them.

Publications

Publications will soon be displayed here

Contact

Prof. Dr. Lars Kaderali
e-mail: lars.kaderali@med.uni-greifswald.de