Research Group "Stochastic Algorithms and Nonparametric Statistics"

Research Seminar "Mathematical Statistics" Summer Semester 2019

  • Place: Weierstrass-Institute for Applied Analysis and Stochastics, Erhard-Schmidt-Hörsaal, Mohrenstraße 39, 10117 Berlin
  • Time: Wednesdays, 10.00 a.m. - 12.30 p.m.
17.04.19 N.N.

24.04.19 Chen Huang (Universität St. Gallen)
LASSO in time and space
We consider the estimation and inference in a system of high-dimensional regression equations allowing for temporal and cross-sectional dependency in covariates and error processes, covering rather general forms of weak dependence. A sequence of regressions with many regressors using LASSO (Least Absolute Shrinkage and Selection Operator) is applied for variable selection purpose, and an overall penalty level is carefully chosen by a block multiplier bootstrap procedure to account for multiplicity of the equations and dependencies in the data. Correspondingly, oracle properties with a jointly selected tuning parameter are derived. We further provide high-quality de-biased simultaneous inference on the many target parameters of the system. We provide bootstrap consistency results of the test procedure, which are based on a general Bahadur representation for the Z-estimators with dependent data. Simulations demonstrate good performance of the proposed inference procedure. Finally, we apply the method to quantify spillover effects of textual sentiment indices in a financial market and to test the connectedness among sectors.
01.05.19 Public Holiday

08.05.19 N.N.

15.05.19 Frank Konietschke (Charité Berlin)
Small data: A big data problem
Small sample sizes occur frequently and especially in preclinical research. Most statistical methods are only valid if sample sizes are large and thus, investigating the methods' behavior when samples are small is tempting. It turns out, that few statistical methods are as reliable as throwing a coin when samples are small. In this talk, we propose few improvements using resampling and permutation methods. In particular, we will answer the question "When and how do permutation methods work?". Real data sets illustrates the application of the proposed mean based and purely nonparametric rank-based methods.
22.05.19 Prof. Gilian Heller (Macquarie University, Sydney)
Part I: Parameter orthogonality and the GAMLSS family of distributions
Part I: Parameter orthogonality is a desirable property of statistical distributions having more than one parameter. When parameters are orthogonal then their maximum likelihood estimates are asymptotically independent. Within the exponential family, the mean and dispersion parameter are orthogonal; however in general this is not the case. This work is motivated by a trial in Parkinson's disease patients in which one of the outcomes is the number of falls. Inspection of the data reveals that the Poisson-inverse Gaussian (PiG) distribution is appropriate, and that the experimental treatment reduces not only the mean, but also the variability, substantially. Conventional analysis assumes a treatment effect on the mean, either adjusted or unadjusted for covariates, and a constant dispersion parameter. e find that we reach quite different conclusions on the treatment effect on the mean, depending on whether or not a model is specified for the dispersion parameter.
Part II: Ordinal regression models for continuous scales
Part II: Visual Analogue Scales (VAS) are used for measuring quantities which are intangible and difficult to measure on conventional scales, such as pain, anxiety and quality of life. These are generally used for self-rating. Subjects are given a linear scale of 100 mm and asked to put a mark where they perceive themselves. The scale has verbal anchor descriptors at each extreme, such as (in the pain context) ?no pain? and ?worst pain imaginable?. The VAS reading is taken as the measurement from the left endpoint to the subject's mark, and is usually normalized to lie in the interval [0,1]. Statistical analysis of the VAS is controversial. While it is a bounded, continuous variable, several authors have argued that it is ordinal, rather than ratio in nature, and should be treated as such. The issue is that, for example, a 1-cm difference in VAS scores at the lower end of the scale does not necessarily represent the same difference in the intangible outcome as a 1-cm difference at the upper end; and a doubling of VAS score may not translate to a doubling of e.g. the pain or anxiety. This problem is overcome by treating VAS measurements as ordinal rather than ratio data. We therefore refer to scales of this type as continuous ordinal. We have developed a regression framework for continuous ordinal responses. We express the likelihood in terms of a function connecting the scale with an underlying continuous latent variable and approximate this function non-parametrically. Then a general semi-parametric regression framework for continuous scales is developed. The model is shown to be a conditional transformation model, and is generalizable to a much wider range of uses than the context in which it was developed. We illustrate our method on a quality of life data set.
29.05.19 Prof. Denis Belomestny (Universität Duisburg-Essen)
Density deconvolution under general assuptions on measurement error distribution
The subject of my talk is the density deconvolution problem under general assumptions on the measurement error distribution. Typically deconvolution estimators are constructed using Fourier transform techniques, and the main assumption is that the measurement error characteristic function does not vanish on the real line. This assumption is rather strong and does not hold in many cases of interest. Here we develop a new technique to deal with this problem which allows us to recover the standard convergence rates without additional assumptions on the measurement error distribution. Joint work with A. Goldenshluger (University of Haifa).
05.06.19 Egor Klochkov (HU Berlin)
Influencer dynamics in opinion networks
12.06.19 Prof. S. Müller (University of Sydney, Australia)
c2pLasso: The categorical-continuous pliable Lasso to identify brain regions affecting motor impairment in Huntington disease

In many clinical studies, prediction models are essential for forecasting and monitoring the progression of a disease. Developing prediction models is a challenge when dealing with high-dimensional data since we do not know which variables are related to the response variable of interest and this relationship may depend on other continuous or categorical modifying variables as well. We formalize this problem as the varying-coefficient model selection and propose a novel variable selection method, c2pLasso, that accounts for both continuous and categorical modifying variables.

Our contributions are three-fold:

  1. The c2pLasso method is shown to better screen irrelevant variables over the existing method that ignores the group structure of categorical modifying variables and to lead to a prediction model with higher accuracy and easier interpretation.
  2. Our method adequately considers the pre-specified group structure among modifying variables in addition to unstructured modifying variables.
  3. The c2pLasso is empirically shown to perform better than existing methods such as the Lasso and pLasso even when there is no categorical modifying variable or any pre-specified group structure among modifying variables. Using simulation studies, we show our method selects less irrelevant variables compared to existing methods while choosing relevant variables correctly. This provides us with a prediction model with higher specificity, lower false discovery rate and lower mean squared error. The proposed methodology is motivated by and illustrated using data from a Huntington disease study; the result identifies brain regions associated with motor impairment accounting for differentiated relationship by disease severity. To the best of our knowledge, our study is the first to identify the interaction effect between disease severity and the volume of brain regions in a varying-coefficient model framework. This is joint work with Rakheon Kim and Tanya Garcia, both at Texas A&M, Department of Statistics.
19.06.19 Karel Hron (Palacký University, Olomouc)
Weighting of densities in Bayes spaces with application to simplicial functional principal component analysis
Probability density functions (PDFs) can be understood as functional data carrying relative information. As such, standard methods of functional data analysis (FDA) are not appropriate for their statistical processing. They are typically designed in the L2 space (with Lebesgue reference measure), thus cannot be directly applied to densities, as the metrics of L2 does not honor their geometric properties. This has recently motivated the construction of the so-called Bayes Hilbert spaces, which result from the generalization of the Aitchison geometry for compositional data to the in nite dimensional setting. More precisely, if we focus on PDFs restricted to a bounded support (that is mostly used in practical applications), they can be represented with respect to the Lebesgue reference measure using the Bayes space of positive real functions with square-integrable logarithm. The reference measure can be easily changed through the well-known chain rule and interpreted as a weighting technique in Bayes spaces. Moreover, it impacts on the geometry of the Bayes spaces and results in so-called weighted Bayes spaces. The aim of this contribution is to show the e ects of changing the reference measure from the Lebesgue measure to a general probability measure focusing on its practical implications for the Simplicial Functional Principal Component Analysis (SFPCA). A centered log-ratio transformation is proposed to map a weighted Bayes spaces into an unweighted L2 space (i.e. with Lebesgue reference measure), thus it enables to apply standard statistical methods such as SFPCA on PDFs.
26.06.19 Alexandra Suvorikova (Universität Potsdam)
Seminar room no. 406/405 at MO 39!!! On CLT in Bures-Wasserstein space and beyond
In the first part of the talk we present some concentration and convergence properties of Bures-Wasserstein (BW) barycenters of hermitian finite-dimensional matrices, and explain how they can be used for investigation of geometry of DNA molecules modelled as a union of ridgid bodies. In the second part we show how the framework of classical resampling techniques can be extended to the case of the BW space, and introduce some geometrical intuition behind the construction of non-asymptotic confidence sets for BW barycenters.
03.07.19 Dr. Claudia Strauch (Universität Mannheim)
Concentration and nonparametric learning of diffusion processes
We start by discussing uniform concentration inequalities for continuous-time analogues of empirical processes and related stochastic integrals of ergodic diffusion processes. Our approach substantially relies on combining the device of martingale approximation and moment bounds which are obtained by the generic chaining method. As a concrete statistical application, we consider the question of estimating the drift function for a large class of ergodic diffusion processes. The unknown drift is supposed to belong to a nonparametric class of smooth functions of unknown order. We suggest a fully data-driven procedure which allows for rate-optimal drift estimation (with respect to sup-norm risk) and, at the same time, yields an asymptotically effiient estimator of the invariant density of the diffusion. In the last part of the talk, we sketch applications of our results to problems from stochastic control theory. One of the fundamental assumptions in stochastic control of continuous time processes is that the dynamics of the underlying (diffusion) process is known. This is, however, usually not fulfilled in practice. We study a toy model for harvesting and natural resource management, mathematically described as an impulse control problem. In variants of this model, we suggest ways to both learn the dynamics of the underlying process and control well at the same time. In particular, the combination of results from stochastic control and our previous analysis of the sup-norm risk allows to derive mathematical results for reinforcement learning.
10.07.19 Arnak Dalalyan (ENSAE Paris)



last reviewed: May 29, 2019 by Christine Schneider