AG DANK Herbsttagung 2016


Freitag, 18. November 2016:

 13:30-14:00 Registration
 14:00-14:10 Hans-Joachim Mucha (WIAS Berlin)
Welcome and Opening
Session 1: Chair Christian Hennig
 14:10-14:50 Karsten Tabelow (WIAS Berlin)
Functional Magnetic Resonance Imaging: Processing Large Dataset
Functional Magnetic Resonance Imaging (fMRI) is a versatile imaging technique to observe the human brain at work. Besides the scientific value for understanding the principles of our mind the analysis of fMRI data is now standard in clinical applications as well. In this talk we will give a (surely incomplete) survey of fMRI analysis and data processing. Download pdf here
 14:50-15:30 Willi Sauerbrei (Universität Freiburg)
Regression model-building with continuous variables -- The multivariable fractional polynomial (MFP) approach
Download pdf here
 15:30-16:00 Kaffeepause
Session 2: Chair Ulrich Müller-Funk
 16:00-16:30 Thorsten Dickhaus (Universität Bremen)
COMBI - Combining high-dimensional classification and multiple hypotheses testing for the analysis of big data in genetics
The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position on the human genome individually for statistical significance of its association with the (binary) phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the considered set of genetic markers, in our case single nucleotide polymorphisms (SNPs), in a mathematically well-controlled manner into account. Our novel two-stage algorithm, COMBI, first learns a high-dimensional classification model by training a support vector machine to determine a subset of candidate SNPs. Then, in a second stage of data analysis, a multiple hypotheses test is carried out for these candidate SNPs, employing a resampling-based $p$-value threshold correction guaranteeing type I error control for the entire two-stage method. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008-2015 period, we show that our method outperforms ordinary raw $p$-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.\ e., non-replicated) and more true (i.\ e., replicated) discoveries when its results are validated on later GWAS. More than 80\% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. These findings are confirmed by computer simulations utilizing semi-synthetic data. The presentation is based on Mieth et al. (2016). Download pdf here
 16:30-17:00 Marcus Weber & Konstantin Fackeldey (ZIB Berlin)
GenPCCA: Markov State Models for Non-Equilibrium Steady States
For equilibrium systems Markov State Models (MSM) are a powerful tool for grouping states according to a metastability criterion. Given a reversible Markov chain, in MSM the eigenvalue structure of the underlying Markov chain is exploited for detecting metastable sets, such that the dynamics of a system in a high dimensional space can be described by the entries of a small transition probability matrix. Considering Non-Equilibrium Steady States the underlying Markov chain is no longer reversible and thus the eigenvalue structure, being the backbone for MSM can no longer be employed. To overcome this, we present a novel MSM method (GenPCCA) being capable to find a low dimensional description of even non reversible Markov processes by using a Schur decomposition instead of using eigen vectors. We show the performance of GenPCCA on networks for gene expression. Download pdf here
 17:00-17:40 Andreas Geyer-Schulz (Universität Karlsruhe)
Recommender Systems for (Scientific) Libraries
Download pdf here
 17:40-17:50 Competition dataset: Presentation of Results (Chairs: C. Hennig / A. Mucha)
 17:50-18:00 Presentation of Gero Szepannek (Unitversität Stralsund)
 18:00-18:10 Presentation of Gunter Ritter (Unitversität Passau)
 18:10-18:20 Presentation of Markus Weber (ZIB Berlin)
 18:20-18:30 Presentation of Reinhard Schachtner (Infineon AG)
 19:30 Workshop Dinner im Restaurant Mutter Hoppe. Die Kosten trägt jeder Teilnehmer selbst.

Samstag, 19. November 2016:

Session 3: Chair Hans-Joachim Mucha
 09:00-09:30 Christian Hennig (University College London)
Preprocessing, Distanzen und Fussball
Using a dataset of football player performance data, we discuss exemplarily different decisions by the user that are required for dissimilarity defnition and clustering, namely representation, transformation, standardisation and variable weighting.
 09:30-10:00 Gero Szepannek (Fachhochschule Stralsund)
On the Practical Relevance of Modern Machine Learning Algorithms for Credit Scoring Applications
Although many new algorithms like e.g. support vector machines, boosting, random forests or neural networks have been proposed in the recent past logistic regression does still represent the gold standard in industrial praxis. Benchmarking studies show the general superiority of flexible learning techniques that are able to detect complex structures. These studies typically restrict to the evaluation of one or several performance measures (like misclassification rate) and ignore further aspects of practical feasibility. In this paper a critical investigation of pros and cons of modern machine learning techniques with respect to business requirements and their practical relevance is worked out. An exemplary case study based on credit scoring using random forests is executed.
 10:00-10:30 Gunter Ritter (Universität Passau)
Probabilistische Variablenselektion in der Clusteranalyse
Download pdf here
 10:30-11:00 Kaffeepause
Session 4: Chair Berthold Lausen
 11:00-11:30 Andreas Geyer-Schulz (Universität Karlsruhe)
On the Analysis of Irrational Behavior in Car Configuration Data
Download pdf here
 11:30-12:00 Adalbert Wilhelm (Jacobs University Bremen)
Predicting military conflicts by data-driven techniques
Download pdf here
 12:00-12:30 Bernd Fischer (DKFZ Heidelberg)
Inferring Directional Genetic Interactions from Combinatorial, Multi-parametric, Replicated Data
Genes display epistatic (genetic) interactions, whereby the presence of one genetic variant can mask, alleviate or amplify the phenotypic effect of other variants. We have developed computational and statistical methods for the analysis of large-scale, image-based genetic interaction screens. In the presented screen (Fischer et al., eLife, 2015) all pairwise geneknock downs of 1367 * 72 genes. This work presents the preprocessing, normalization, and quality control for a large scale, image-basaed genetic interaction screens. Furthermore, we developed a new feture selection methods that aims to separate the biological relevant information from technical noise. This feature selection method used information from replicated experiments. In the down-stream analysis it is a problem to estimate directional genetic interactions. Such a directional relationship is present, for instance, if one gene product positively or negatively regulates the activity of the other, if its function temporally precedes that of the other, or if its function is a necessary requirement for the action of the other. We developed a new method to detect directional interactions that requires multi-parametric data. The approach has shown to recover known biological processes as well as a novel protein complex that reverses the effect of a signaling pathway in cancer. Download pdf here
 12:30-13:00 Hans-Joachim Mucha (WIAS Berlin) & Tatjana Mirjam Gluhak (Universität Mainz)
Finding Groups in Compositional Data
The talk is concerned with finding groups (clusters) in compositional data, that is nonnegative data with row sums (or column sums, respectively) equal to a constant, usually 1 in case of proportions or 100 in case of percentages. Without loss of generality, the cluster analysis of observations (row points) of compositional data is considered here, where the row profiles contains parts of some whole. Special distance functions between the profiles are proposed. Finally, applications to archaeometry are presented. Download pdf here