Research Group "Stochastic Algorithms and Nonparametric Statistics"

Seminar "Modern Methods in Applied Stochastics and Nonparametric Statistics" Winter Semester 2023/24

17.10.2023 N.N.

24.10.2023

31.10.2023 Dr. Pavel Dvurechensky (WIAS Berlin)
Cancelled due to unforeseen circumstances! Decentralized local stochastic extra-gradient for variational inequalities (hybrid talk)
We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach. Joint work with A. Beznosikov, A. Koloskova, V. Samokhin, S. Stich, A. Gasnikov.
07.11.2023 Peter K. Friz (TU & WIAS Berlin)
On (local) rough stochastic volatility and weak rates (hybrid talk)
Stocks, futures, commodities and exchange rates share very similar statistical features The self-organising mechanism through which all markets appear to settle close to such a critical point remains a mystery. From a pure engineering point of view, rough volatility has gotten us "a step closer to the perfect volatility model" (Bouchaud 2020), and this talk is devoted to a better mathematical understanding thereof. In part 1 we consider simple rough stochastic volatility specifications itentify the optimal weak rate of convergence, out of reach of classical PDE based methods, which stabilises in the "very" rough regime when $H$ is close to zero, popular in econophysics. Part 2 is devoted to general stochastic volatility (including but not restricted to a rough volatility scale) in presence of an additional leverage functions. Such local stochastic volatility models come with popular data-driven particle methods, yet well-posedness remains largely open. As remedy we propose to consider instead an Euler discretisation, trivially well-posed, and show that it achieves an eps-perfect calibration for eps Euler steps in time. This is joint work with W. Salkeld (Brown University, Part 1) and T. Wagenhofer (TU Berlin, Part 1 & 2).
14.11.2023 Leonard Schmitz (Universität Greifswald)
Two-parameter sums signatures and corresponding quasisymmetric functions (hybrid talk)
Quasisymmetric functions have recently been used in time series analysis as polynomial features that are invariant under, so-called, dynamic time warping. We extend this notion to data indexed by two parameters and thus provide warping invariants for images. We show that two-parameter quasisymmetric functions are complete in a certain sense, and provide a two-parameter quasi-shuffle identity. A compatible coproduct is based on diagonal concatenation of the input data, leading to a (weak) form of Chen's identity.
21.11.2023 N.N.

28.11.2023 Christophe Roux (ZIB)
Bounding geometric penalties in first-order Riemannian optimization (hybrid talk)
Riemannian optimization is the study of optimizing functions defined over Riemannian manifolds. This paradigm is used in cases that naturally present Riemannian constraints, which allows for exploiting the geometric structure of our problem, and for transforming it into an unconstrained one by working in the manifold. Convergence rates of Riemannian optimization algorithms often rely on geometric quantities depending on the sectional curvature and the distance between iterates and an optimizer. Numerous previous works bound the latter only by assumption, leading to an incomplete analysis and unquantified rates. In this talk, I will discuss how to remove this limitation for multiple algorithms and as a result quantify their rates of convergence. In particular, I will present results on deterministic first-order minimization algorithms such as Gradient Descent and the (Inexact) Proximal Point Algorithm as well as min-max algorithms such as the Extra Gradient Method.
05.12.2023 Dr. Pavel Dvurechensky (WIAS Berlin)
Decentralized local stochastic extra-gradient for variational inequalities (hybrid talk)
We consider distributed stochastic variational inequalities (VIs) on unbounded domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that, in particular, covers the settings of fully decentralized calculations with time-varying networks and centralized topologies commonly used in Federated Learning. Moreover, multiple local updates on the workers can be made for reducing the communication frequency between the workers. We extend the stochastic extragradient method to this very general setting and theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone (when a Minty solution exists) settings. The provided rates explicitly exhibit the dependence on network characteristics (e.g., mixing time), iteration counter, data heterogeneity, variance, number of devices, and other standard parameters. As a special case, our method and analysis apply to distributed stochastic saddle-point problems (SPP), e.g., to the training of Deep Generative Adversarial Networks (GANs) for which decentralized training has been reported to be extremely challenging. In experiments for the decentralized training of GANs we demonstrate the effectiveness of our proposed approach. Joint work with A. Beznosikov, A. Koloskova, V. Samokhin, S. Stich, A. Gasnikov.
12.12.2023 Prof. Dr. Leonid Berlyand (Penn State University)
Enhancing accuracy in deep learning using random matrix theory (hybrid talk)
We discuss applications of random matrix theory (RMT) to the training of deep neural networks (DNNs). Our numerical results show that if the drastic reduction of DNN parameters is guided by the Marchenko-Pastur spectral approach, then the accuracy of the pruned DNN and CNN with much fewer parameters does not decrease. For fully connected DNNs the accuracy actually increases. Moreover, the deviation from the mean is smaller for the non-pruned DNNs. Furthermore, we show how this RMT can be used to remove 10-12% of parameters from state-of-the-art DNNs such as Resnet and VIT without reducing and even increasing accuracy. Finally, we provide a theoretical understanding of these results by proving the Pruning Theorem that establishes a rigorous relation between the accuracy of the pruned and non-pruned DNNs. This is joint work with E. Sandier, Y. Shmalo and L. Zhang
19.12.2023 N. N.

09.01.2024 Hannes Kern (TU Berlin)
Application of stochastic multiparameter sewing to regularity of local times associated to Gaussian sheets
In this talk, we shortly present the stochastic multiparameter sewing lemma, before focussing on the main topic: Its application to local times. We derive novel regularity estimates for local times of locally non-deterministic Gaussian sheets. These estimates are strong enough to show regularization by noise for SDEs in the plain. In this context, we make the observation that regularization effects profiting from the parameters of the underlying fields in an additive fashion usually appear due to boundary terms of the driving stochastic field.
16.01.2024 Benjamin Walker (University of Oxford)
Log neural controlled differential equations: The lie brackets make a difference
The vector field of a controlled differential equation (CDE) describes the relationship between a control path and the evolution of a solution path. Neural CDEs (NCDEs) parameterise a CDE's vector field using a neural network, treat time series data as observations from a control path, and use the solution path as their hidden state. Given a partition of time, the Log-ODE method approximates the solution of a CDE by replacing it with an autonomous ODE on each interval. Building on the approach of neural rough differential equations (NRDEs), we introduce Log-NCDEs, which are the first application of the Log-ODE method to NCDEs. On a range of multivariate time series classification benchmarks, Log-NCDEs are shown to achieve an average test set accuracy higher than both NCDEs and NRDEs, and inline with current state-of-the-art deep learning methods.
23.01.2024 Luca Pelizzari (WIAS Berlin)
Rough PDEs for local stochastic volatility models (hybrid talk)
In this talk, we introduce a novel pricing methodology in general, possibly non-Markovian local stochastic volatility (LSV) models. We observe that by conditioning the LSV dynamics on the Brownian motion that drives the volatility, one obtains a time-inhomogeneous Markov process. Using tools from rough path theory, we describe how to precisely understand the conditional LSV dynamics and reveal their Markovian nature. The latter allows us to connect the conditional dynamics to so-called rough partial differential equations (RPDEs), through a Feynman-Kac type of formula. In terms of European pricing, conditional on realizations of one Brownian motion, we can compute conditional option prices by solving the corresponding linear RPDEs, and then average over all samples to find unconditional prices. Our approach depends only minimally on the specification of the volatility, making it applicable for a wide range of classical and rough LSV models, and it establishes a PDE pricing method for non-Markovian models. This talk is based on joint work with Peter Bank, Christian Bayer and Peter Friz.
30.01.2024 N.N.
Different room and building: 3.13 at HVP 11a !
06.02.2024 Dr. Aleksei Kroshnin (WIAS Berlin)
Robust k-means in metric spaces
In this talk, we consider robust algorithms for the k-means clustering (quantization) problem where a quantizer is constructed based on an i.i.d. sample. While the well-known asymptotic result by Pollard shows that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in a Euclidean space, non-asymptotic bounds are usually obtained under the assumption of bounded support. We discuss a robust k-means in metric spaces based on trimming (similar to the one proposed by Cuesta-Albertos et al. in 1997), and prove novel non-asymptotic bounds on the excess distortion depending on the optimal distortion rather then the second moment of the distribution. In particular, this bound decreases the gap to the lower bound obtained by Bartlett et al. (1998).
13.02.2024 Dr. Hans Kersting (Yahoo! Research)
Exploration and implicit bias due to an optimizer's stochasticity
The data sets used to train modern machine-learning models are often huge, e.g. millions of images. This makes it too expensive to compute the true gradient over all data sets. In each gradient descent (GD) step, a stochastic gradient is thus computed over a subset ("mini-batch?) of data. The resulting stochastic gradient descent (SGD) algorithm, and its variants, is the main workhorse of modern machine learning. Until recently, most machine-learning researchers would have preferred to use GD, if they could, and considered SGD only as a fast approximation to GD. But new research suggests that the stochasticity in SGD is part of the reason why SGD works so well. In this talk, we investigate multiple theories on the advantages of the noise in SGD, including better generalization in flatter minima (implicit bias) and faster escapes from difficult parts of the landscapes (such as saddle points and local minima). We highlight how correlating noise can help optimization and zoom in on the question which noise structure would be optimal for SGD.
20.02.2024
Different room and building: 3.13 at HVP 11a !
27.02.2024

05.03.2024

12.03.2024 Dr. Vaios Laschos (WIAS Berlin)
Optimal transport meets rough paths
19.03.2024 Dr. Alexandra Suvorikova (WIAS Berlin)
Bernstein-type inequalities for unbounded martingales and empirical processes


last reviewed: February 12, 2024 by Christine Schneider