Research Group "Stochastic Algorithms and Nonparametric Statistics"

Research Seminar "Mathematical Statistics" Summer Semester 2025

16.04.2025
HVP 11 a, R.313
23.04.2025
HVP 11 a, R.313
30.04.2025 Ratmir Miftachov (HU Berlin)
HVP 11 a, R.313 Early stopping for regression trees
We develop early stopping rules for growing regression tree estimators. The fully data-driven stopping rule is based on monitoring the global residual norm. The best-first search and the breadth-first search algorithms together with linear interpolation give rise to generalized projection or regularization flows. A general theory of early stopping is established. Oracle inequalities for the early-stopped regression tree are derived without any smoothness assumption on the regression function, assuming the original CART splitting rule, yet with a much broader scope. The remainder terms are of smaller order than the best achievable rates for Lipschitz functions in dimension . In real and synthetic data the early stopping regression tree estimators attain the statistical performance of cost-complexity pruning while significantly reducing computational costs.
07.05.2025 N.N.
HVP 11 a, R.313
14.05.2025 Vladimir Spokoiny (WIAS/HU Berlin)
HVP 11 a, R.313 Estimation and inference for deep neuronal networks and inverse problems
The talk discusses two important issues in modern high-dimensional statistics. Success of DNN in practical applications is at the same time a great challenge for statistical theory due to the "curse of dimensionality'' problem. Manifold type assumptions are not really helpful and do not explain the double descent phenomenon when the DNN accuracy improves with of overparametrization. We offer a different view on the problem based on the notion of effective dimension and a calming device. The idea is to decouple the structural DNN relation by extending the parameter space and use a proper regularization without any substantial increase of the effective dimension.The other related issue is the choice of regulation in inverse problems. We show that a simple ridge penalty (Tikhonov regularization) does a good job in any inverse problem for which the operator is more regular than the unknown signal. In the opposite case, one should use a model reduction technique like spectral cut-off. Estimation and inference for Deep Neuronal Networks and inverse problems.
21.05.2025 Yi Yu (University of Warwick)
HVP 11 a, R.313 Contextual dynamic pricing: Algorithms, optimality and local differential privacy constraints
We study contextual dynamic pricing problems where a firm sells products to $T$ sequentially-arriving consumers, behaving according to an unknown demand model. The firm aims to minimize its regret over a clairvoyant that knows the model in advance. The demand follows a generalized linear model (GLM), allowing for stochastic feature vectors in $mathbb R^d$ encoding product and consumer information. We first show the optimal regret is of order $sqrtdT$, up to logarithmic factors, improving existing upper bounds by a $sqrtd$ factor. This optimal rate is materialized by two algorithms: an explore-then-commit (ETC) algorithm and a confidence bound-type algorithm. A key insight is an intrinsic connection between dynamic pricing and contextual multi-armed bandit problems with many arms with a careful discretization. We further extend our study to adversarial contexts and propose algorithms that are statistically and computationally more efficient than existing methods in the literature. We further study contextual dynamic pricing under local differential privacy (LDP) constraints. We propose a stochastic gradient descent-based ETC algorithm achieving regret upper bounds of order $dsqrtT/epsilon$, up to logarithmic factors, where $epsilon>0$ is the privacy parameter. The upper bounds with and without LDP constraints are matched by newly constructed minimax lower bounds, characterizing costs of privacy. Moreover, we extend our study to dynamic pricing under mixed privacy constraints, improving the privacy-utility tradeoff by leveraging public data. This is the first time such setting is studied in the dynamic pricing literature and our theoretical results seamlessly bridge dynamic pricing with and without LDP. Extensive numerical experiments and real data applications are conducted to illustrate the efficiency and practical value of our algorithms.
28.05.2025 Jason Klusowski (Princeton University)
HVP 11 a, R.313 Statistical-computational trade-offs for recursive adaptive partitioning estimators
Recursive adaptive partitioning estimators, like decision trees and their ensembles, are effective for high-dimensional regression but usually rely on greedy training, which can become stuck at suboptimal solutions. We study this phenomenon in estimating sparse regression functions over binary features, showing that when the true function satisfies a certain structural property, greedy training achieves low estimation error with only a logarithmic number of samples in the feature count. However, when this property is absent, estimation becomes exponentially more difficult. Interestingly, this dichotomy between efficient and inefficient estimation resembles the behavior of two-layer neural networks trained with SGD in the mean-field regime. Meanwhile, ERM-trained recursive adaptive partitioning estimators always achieve low estimation error with logarithmically many samples, revealing a fundamental statistical-computational trade-off for greedy training.
04.06.2025 Sebastian Kassing (TU Berlin)
HVP 11 a, R.313 Stochastic optimization: From quadratic programs to the training of neural networks
Many foundational results in (stochastic) optimization-ranging from convergence guarantees and rates to asymptotic normality-have been derived under strong convexity assumptions or even for quadratic objective functions. However, such assumptions often fail to hold in modern machine learning applicati- ons, where objectives are typically non-convex. This talk explores a recent line of research that extends classical results in stochastic gradient-based optimization to broader classes of functions satisfying the Polyak-Lojasiewicz (PL) inequality, a condition that is signficantly more relevant for practical deep lear- ning models. We consider typical acceleration techniques such as Polyak's Heavy Ball and Ruppert-Polyak averaging and use a geometric interpretation of the PL-inequality to show that many algorithmic properties extend to a more general and realistic class of objectives
11.06.2025 Dimitri Konen (University of Cambridge)
HVP 11 a, R.313 Data assimilation with the 2D Navier-Stokes equations: Optimal Gaussian asymptotics for the posterior measure
A functional Bernstein von Mises theorem is proved for posterior measures arising in a data assimilation problem with the two-dimensional Navier-Stokes equation where a Gaussian process prior is assigned to the initial condition of the system. The posterior measure, which provides the update in the space of all trajectories arising from a discrete sample of the dynamics, is shown to be approximated by a Gaussian random function arising from the solution to a linear parabolic PDE with Gaussian initial condition. The approximation holds in the strong sense of the supremum norm on the regression functions, showing that predicting future states of Navier-Stokes systems admits root(N)-consistent estimators even for commonly used nonparametric models. Consequences to credible bands and uncertainty quantification are discussed, and a functional minimax theorem is derived that describes the Cramer-Rao lower bound for estimating the state of the non-linear system, which is attained by the data assimilation algorithm.
18.06.2025 Lasse Vuursteen (University of Pennsylvania)
HVP 11 a, R.313 Adaptive estimation under differential privacy constraints
Estimation guarantees in nonparametric models typically depend on underlying function classes (or hyper- parameters) that are seldom known in practice. Adaptive estimators provide simultaneous near-optimal performance across multiple such function classes. In this talk, I will discuss recent work with co-authors Tony Cai and Abhinav Chakraborty, in which we study adaptation under differential privacy constraints. Differential privacy fundamentally limits the information that can be revealed about each individual datum by each data holder. We develop a general theory for adaptation under differential privacy in the context of estimating linear functionals of a density. Our framework characterizes the difficulty of private adaptation problems through a specific "between-class modulus of continuity" that exactly describes the optimal achievable performance for private estimators that must adapt across two or more function classes. Our theory reveals and quantifies the extent to which adaptation between specific function classes suffers as a consequence of imposing differential privacy constraints.
25.06.2025 cancelled!
HVP 11 a, R.313
02.07.2025 Frank Konietschke (Charité Berlin)

09.07.2025 Vincent Rivoirard (Université Dauphine, Paris)

16.07.2025 Claudia Strauch (Universität Heidelberg)



last reviewed: June 11, 2025 by Christine Schneider