• L. Starke, K. Tabelow, Th. Niendorf, A. Pohlmann, eds., Denoising for improved parametric MRI of the kidney: Protocol for nonlocal means filtering, 2216 of Methods in Molecular Biology, Humana, New York, NY., 2021, pp. 565--576, (Chapter Published), DOI 10.1007/978-1-0716-0978-1_34 .

  • P. Friz, M. Hairer, A Course on Rough Paths: With an Introduction to Regularity Structures, Universitext, Springer International Publishing, Basel, 2020, 346 pages, (Monograph Published), DOI 10.1007/978-3-030-41556-3 .

Articles in Refereed Journals

  • N. Tupitsa, P. Dvurechensky, A. Gasnikov, S. Guminov, Alternating minimization methods for strongly convex optimization, Journal of Inverse and Ill-Posed Problems, (2021), published online on 16.04.2021, DOI 10.1515/jiip-2020-0074 .
    We consider alternating minimization procedures for convex optimization problems with variable divided in many block, each block being amenable for minimization with respect to its variable with freezed other variables blocks. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under Polyak-Łojasiewicz condition, which can be seen as a relaxation of the strong convexity assumption. Under strong convexity assumption in many-blocks setting we provide an accelerated alternating minimization procedure with linear rate depending on the square root of the condition number as opposed to condition number for the non-accelerated method.

  • P. Friz, P. Gassiat, P. Pigato , Precise asymptotics: Robust stochastic volatility models, The Annals of Applied Probability, 31 (2021), pp. 896--940, DOI 10.1214/20-AAP1608 .
    We present a new methodology to analyze large classes of (classical and rough) stochastic volatility models, with special regard to short-time and small noise formulae for option prices. Our main tool is the theory of regularity structures, which we use in the form of Bayer et al. (Math. Finance30 (2020) 782--832) In essence, we implement a Laplace method on the space of models (in the sense of Hairer), which generalizes classical works of Azencott and Ben Arous on path space and then Aida, Inahama--Kawabi on rough path space. When applied to rough volatility models, for example, in the setting of Bayer, Friz and Gatheral (Quant. Finance16 (2016) 887--904) and Forde--Zhang (SIAM J. Financial Math.8 (2017) 114--145), one obtains precise asymptotics for European options which refine known large deviation asymptotics.

  • A. Maltsi, T. Niermann, T. Streckenbach, K. Tabelow, Th. Koprucki, Numerical simulation of TEM images for In(Ga)As/GaAs quantum dots with various shapes, Optical and Quantum Electronics, 52 (2020), pp. 257/1--257/11, DOI 10.1007/s11082-020-02356-y .
    We present a mathematical model and a tool chain for the numerical simulation of TEM images of semiconductor quantum dots (QDs). This includes elasticity theory to obtain the strain profile coupled with the Darwin-Howie-Whelan equations, describing the propagation of the electron wave through the sample. We perform a simulation study on indium gallium arsenide QDs with different shapes and compare the resulting TEM images to experimental ones. This tool chain can be applied to generate a database of simulated TEM images, which is a key element of a novel concept for model-based geometry reconstruction of semiconductor QDs, involving machine learning techniques.

  • O. Butkovsky, A. Kulik, M. Scheutzow, Generalized couplings and ergodic rates for SPDEs and other Markov models, The Annals of Applied Probability, 30 (2020), pp. 1--39, DOI 10.1214/19-AAP1485 .
    We establish verifiable general sufficient conditions for exponential or subexponential ergodicity of Markov processes that may lack the strong Feller property. We apply the obtained results to show exponential ergodicity of a variety of nonlinear stochastic partial differential equations with additive forcing, including 2D stochastic Navier-Stokes equations. Our main tool is a new version of the generalized coupling method.

  • O. Butkovsky, M. Scheutzow, Couplings via comparison principle and exponential ergodicity of SPDEs in the hypoelliptic setting, Communications in Mathematical Physics, 379 (2020), pp. 1001--1034, DOI 10.1007/s00220-020-03834-w .
    We develop a general framework for studying ergodicity of order-preserving Markov semigroups. We establish natural and in a certain sense optimal conditions for existence and uniqueness of the invariant measure and exponential convergence of transition probabilities of an order-preserving Markov process. As an application, we show exponential ergodicity and exponentially fast synchronization-by-noise of the stochastic reaction?diffusion equation in the hypoelliptic setting. This refines and complements corresponding results of Hairer and Mattingly (Electron J Probab 16:658?738, 2011).

  • N. Tapia, L. Zambotti, The geometry of the space of branched rough paths, Proceedings of the London Mathematical Society. Third Series, 121 (2020), pp. 220--251, DOI 10.1112/plms.12311 .
    We construct an explicit transitive free action of a Banach space of Hölder functions on the space of branched rough paths, which yields in particular a bijection between theses two spaces. This endows the space of branched rough paths with the structure of a principal homogeneous space over a Banach space and allows to characterize its automorphisms. The construction is based on the Baker-Campbell-Hausdorff formula, on a constructive version of the Lyons-Victoir extension theorem and on the Hairer-Kelly map, which allows to describe branched rough paths in terms of anisotropic geometric rough paths.

  • A. Ivanova, P. Dvurechensky, A. Gasnikov, Composite optimization for the resource allocation problem, Optimization Methods & Software, published online on 12.02.2020, url, DOI 10.1080/10556788.2020.1712599 .
    In this paper we consider resource allocation problem stated as a convex minimization problem with linear constraints. To solve this problem, we use gradient and accelerated gradient descent applied to the dual problem and prove the convergence rate both for the primal iterates and the dual iterates. We obtain faster convergence rates than the ones known in the literature. We also provide economic interpretation for these two methods. This means that iterations of the algorithms naturally correspond to the process of price and production adjustment in order to obtain the desired production volume in the economy. Overall, we show how these actions of the economic agents lead the whole system to the equilibrium.

  • M.S. Alkusa, A. Gasnikov, D. Dvinskikh, D.A. Kovalev, F.S. Stonyakin, Accelerated methods for saddle-point problem, Computational Mathematics and Mathematical Physics, 60 (2020), pp. 1787--1809, DOI 10.1134/S0965542520110020 .
    Recently, it has been shown how, on the basis of the usual accelerated gradient method for solving problems of smooth convex optimization, accelerated methods for more complex problems (with a structure) and problems that are solved using various local information about the behavior of a function (stochastic gradient, Hessian, etc.) can be obtained. The term ?accelerated methods? here means, on the one hand, the presence of some unified and fairly general way of acceleration. On the other hand, this also means the optimality of the methods, which can often be proved rigorously. In the present work, an attempt is made to construct in the same way a theory of accelerated methods for solving smooth convex-concave saddle-point problems with a structure. The main result of this article is the obtainment of in some sense necessary and sufficient conditions under which the complexity of solving nonlinear convex-concave saddle-point problems with a structure in the number of calculations of the gradients of composites in direct variables is equal in order of magnitude to the complexity of solving bilinear problems with a structure.

  • A. Anikin, A. Gasnikov, A. Gornov, D. Kamzolov, Y. Maximov, Y. Nesterov, Efficient numerical methods to solve sparse linear equations with application to PageRank, Optimization Methods & Software, published online on 21.12.2020, url, DOI 10.1080/10556788.2020.1858297 .
    Over the last two decades, the PageRank problem has received increased interest from the academic community as an efficient tool to estimate web-page importance in information retrieval. Despite numerous developments, the design of efficient optimization algorithms for the PageRank problem is still a challenge. This paper proposes three new algorithms with a linear time complexity for solving the problem over a bounded-degree graph. The idea behind them is to set up the PageRank as a convex minimization problem over a unit simplex, and then solve it using iterative methods with small iteration complexity. Our theoretical results are supported by an extensive empirical justification using real-world and simulated data.

  • S. Athreya, O. Butkovsky, L. Mytnik, Strong existence and uniqueness for stable stochastic differential equations with distributional drift, The Annals of Probability, 48 (2020), pp. 178--210, DOI 10.1214/19-AOP1358 .

  • F. Bachoc, A. Suvorikova , D. Ginsbourger, J.-M. Loubes, V. Spokoiny, Gaussian processes with multidimensional distribution inputs via optimal transport and Hilbertian embedding, Electronic Journal of Statistics, 14 (2020), pp. 2742--2772, DOI 10.1214/20-EJS1725 .
    In this work, we propose a way to construct Gaussian processes indexed by multidimensional distributions. More precisely, we tackle the problem of defining positive definite kernels between multivariate distributions via notions of optimal transport and appealing to Hilbert space embeddings. Besides presenting a characterization of radial positive definite and strictly positive definite kernels on general Hilbert spaces, we investigate the statistical properties of our theoretical and empirical kernels, focusing in particular on consistency as well as the special case of Gaussian distributions. A wide set of applications is presented, both using simulations and implementation with real data.

  • D. Belomestny, M. Kaledin, J.G.M. Schoenmakers, Semitractability of optimal stopping problems via a weighted stochastic mesh algorithm, Mathematical Finance. An International Journal of Mathematics, Statistics and Financial Economics, 30 (2020), pp. 1591--1616, DOI 10.1111/mafi.12271 .
    In this article we propose a Weighted Stochastic Mesh (WSM) algorithm for approximating the value of discrete and continuous time optimal stopping problems. It is shown that in the discrete time case the WSM algorithm leads to semi-tractability of the corresponding optimal stopping problem in the sense that its complexity is bounded in order by $varepsilon^-4log^d+2(1/varepsilon)$ with $d$ being the dimension of the underlying Markov chain. Furthermore we study the WSM approach in the context of continuous time optimal stopping problems and derive the corresponding complexity bounds. Although we can not prove semi-tractability in this case, our bounds turn out to be the tightest ones among the complexity bounds known in the literature. We illustrate our theoretical findings by a numerical example.

  • D. Belomestny, J.G.M. Schoenmakers, V. Spokoiny, B. Zharkynbay, Optimal stopping via reinforced regression, Communications in Mathematical Sciences, 18 (2020), pp. 109--121, DOI 10.4310/CMS.2020.v18.n1.a5 .
    In this note we propose a new approach towards solving numerically optimal stopping problems via boosted regression based Monte Carlo algorithms. The main idea of the method is to boost standard linear regression algorithms in each backward induction step by adding new basis functions based on previously estimated continuation values. The proposed methodology is illustrated by several numerical examples from finance.

  • D. Belomestny, J.G.M. Schoenmakers, Optimal stopping of McKean--Vlasov diffusions via regression on particle systems, SIAM Journal on Control and Optimization, 58 (2020), pp. 529--550, DOI 10.1137/18M1195590 .
    In this note we consider the problem of using regression on interacting particles to compute conditional expectations for McKean-Vlasov SDEs. We prove general result on convergence of linear regression algorithms and establish the corresponding rates of convergence. Application to optimal stopping and variance reduction are considered.

  • I. Chevyrev, P. Friz, A. Korepanov, I. Melbourne, Superdiffusive limits for deterministic fast-slow dynamical systems, Probability Theory and Related Fields, 178 (2020), pp. 735--770, DOI 10.1007/s00440-020-00988-5 .

  • M. Coghi, J.-D. Deuschel, P. Friz, M. Maurelli, Pathwise McKean--Vlasov theory with additive noise, The Annals of Applied Probability, 30 (2020), pp. 2355--2392, DOI 10.1214/20-AAP1560 .
    We take a pathwise approach to classical McKean-Vlasov stochastic differential equations with additive noise, as e.g. exposed in Sznitmann [34]. Our study was prompted by some concrete problems in battery modelling [19], and also by recent progress on rough-pathwise McKean-Vlasov theory, notably Cass--Lyons [9], and then Bailleul, Catellier and Delarue [4]. Such a “pathwise McKean-Vlasov theory” can be traced back to Tanaka [36]. This paper can be seen as an attempt to advertize the ideas, power and simplicity of the pathwise appproach, not so easily extracted from [4, 9, 36]. As novel applications we discuss mean field convergence without a priori independence and exchangeability assumption; common noise and reflecting boundaries. Last not least, we generalize Dawson--Gärtner large deviations to a non-Brownian noise setting.

  • J. Diehl, E.-F. Kurusch, N. Tapia, Time-warping invariants of multidimensional time series, Acta Applicandae Mathematicae. An International Survey Journal on Applying Mathematics and Mathematical Applications, published online on 14.05.2020, DOI 10.1007/s10440-020-00333-x .
    In data science, one is often confronted with a time series representing measurements of some quantity of interest. Usually, in a first step, features of the time series need to be extracted. These are numerical quantities that aim to succinctly describe the data and to dampen the influence of noise. In some applications, these features are also required to satisfy some invariance properties. In this paper, we concentrate on time-warping invariants.We show that these correspond to a certain family of iterated sums of the increments of the time series, known as quasisymmetric functions in the mathematics literature. We present these invariant features in an algebraic framework, and we develop some of their basic properties.

  • D. Kamzolov, P. Dvurechensky, A. Gasnikov, Universal intermediate gradient method for convex problems with inexact oracle, Optimization Methods & Software, published online on 17.01.2020, url, DOI 10.1080/10556788.2019.1711079 .

  • L.-Ch. Lin, Y. Chen, G. Pan, V. Spokoiny, Efficient and positive semidefinite pre-averaging realized covariance estimator, Statistica Sinica, 31 (2021), pp. 1--22 (published online on 23.11.2020), DOI 10.5705/ss.202017.0489 .

  • S. Lu, P. Mathé, S. Pereverzyev, Randomized matrix approximation to enhance regularized projection schemes in inverse problems, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 36 (2020), pp. 085013/1-- 085013/20, DOI 10.1088/1361-6420/ab9c44 .
    The authors consider a randomized solution to ill-posed operator equations in Hilbert spaces. In contrast to statistical inverse problems, where randomness appears in the noise, here randomness arises in the low-rank matrix approximation of the forward operator, which results in using a Monte Carlo method to solve the inverse problems. In particular, this approach follows the paradigm of the study N. Halko et al 2011 SIAM Rev. 53 217?288, and hence regularization is performed based on the low-rank matrix approximation. Error bounds for the mean error are obtained which take into account solution smoothness and the inherent noise level. Based on the structure of the error decomposition the authors propose a novel algorithm which guarantees (on the mean) a prescribed error tolerance. Numerical simulations confirm the theoretical findings.

  • Y. Nesterov , A. Gasnikov, S. Guminov, P. Dvurechensky, Primal-dual accelerated gradient methods with small-dimensional relaxation oracle, Optimization Methods & Software, published online on 02.03.2020, url, DOI 10.1080/10556788.2020.1731747 .

  • Y.Y. Park, J. Polzehl, S. Chatterjee, A. Brechmann, M. Fiecas, Semiparametric modeling of time-varying activation and connectivity in task-based fMRI data, Computational Statistics & Data Analysis, 150 (2020), pp. 107006/1--107006/14, DOI 10.1016/j.csda.2020.107006 .
    In functional magnetic resonance imaging (fMRI), there is a rise in evidence that time-varying functional connectivity, or dynamic functional connectivity (dFC), which measures changes in the synchronization of brain activity, provides additional information on brain networks not captured by time-invariant (i.e., static) functional connectivity. While there have been many developments for statistical models of dFC in resting-state fMRI, there remains a gap in the literature on how to simultaneously model both dFC and time-varying activation when the study participants are undergoing experimental tasks designed to probe at a cognitive process of interest. A method is proposed to estimate dFC between two regions of interest (ROIs) in task-based fMRI where the activation effects are also allowed to vary over time. The proposed method, called TVAAC (time-varying activation and connectivity), uses penalized splines to model both time-varying activation effects and time-varying functional connectivity and uses the bootstrap for statistical inference. Simulation studies show that TVAAC can estimate both static and time-varying activation and functional connectivity, while ignoring time-varying activation effects would lead to poor estimation of dFC. An empirical illustration is provided by applying TVAAC to analyze two subjects from an event-related fMRI learning experiment.

  • N. Puchkin, V. Spokoiny, An adaptive multiclass nearest neighbor classifier, ESAIM. Probability and Statistics, 24 (2020), pp. 69--99, DOI 10.1051/ps/2019021 .

  • A. Rastogi, G. Blanchard, P. Mathé, Convergence analysis of Tikhonov regularization for non-linear statistical inverse problems, Electronic Journal of Statistics, 14 (2020), pp. 2798--2841, DOI 10.1214/20-EJS1735 .
    We study a non-linear statistical inverse problem, where we observe the noisy image of a quantity through a non-linear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization) approach to estimate the quantity for the non-linear ill-posed inverse problem. The estimator is defined as the minimizer of a Tikhonov functional, which is the sum of a data misfit term and a quadratic penalty term. We develop a theoretical analysis for the minimizer of the Tikhonov regularization scheme using the concept of reproducing kernel Hilbert spaces. We discuss optimal rates of convergence for the proposed scheme, uniformly over classes of admissible solutions, defined through appropriate source conditions.

  • D. Dvinskikh, S.S. Omelchenko, A. Gasnikov, A.I. Tyurin, Accelerated gradient sliding for minimizing a sum of functions, Doklady Mathematics. Maik Nauka/Interperiodica Publishing, Moscow. English. Translation of the Mathematics Section of: Doklady Akademii Nauk. (Formerly: Russian Academy of Sciences. Doklady. Mathematics)., 101 (2020), pp. 244--246, DOI 10.1134/S1064562420030084 .
    We propose a new way of justifying the accelerated gradient sliding of G. Lan, which allows one to extend the sliding technique to a combination of an accelerated gradient method with an accelerated variance reduction method. New optimal estimates for the solution of the problem of minimizing a sum of smooth strongly convex functions with a smooth regularizer are obtained.

  • D. Dvinskikh, A.I. Tyurin, A. Gasnikov, S.S. Omelchenko, Accelerated and unaccelerated stochastic gradient descent in model generality, Mathematical Notes, 108 (2020), pp. 511--522, DOI 10.1134/S0001434620090230 .
    A new method for deriving estimates of the rate of convergence of optimal methods for solving problems of smooth (strongly) convex stochastic optimization is described. The method is based on the results of stochastic optimization derived from results on the convergence of optimal methods under the conditions of inexact gradients with small noises of nonrandom nature. In contrast to earlier results, all estimates in the present paper are obtained in model generality.

  • CH. Bayer, M. Redmann, J.G.M. Schoenmakers, Dynamic programming for optimal stopping via pseudo-regression, Quantitative Finance, published online on 01.09.2020, url, DOI 10.1080/14697688.2020.1780299 .
    We introduce new variants of classical regression-based algorithms for optimal stopping problems based on computation of regression coefficients by Monte Carlo approximation of the corresponding L2 inner products instead of the least-squares error functional. Coupled with new proposals for simulation of the underlying samples, we call the approach "pseudo regression". We show that the approach leads to asymptotically smaller errors, as well as less computational cost. The analysis is justified by numerical examples.

  • CH. Bayer, D. Belomestny, M. Redmann, S. Riedel, J.G.M. Schoenmakers, Solving linear parabolic rough partial differential equations, Journal of Mathematical Analysis and Applications, 490 (2020), pp. 124236/1--124236/45, DOI 10.1016/j.jmaa.2020.124236 .
    We study linear rough partial differential equations in the setting of [Friz and Hairer, Springer, 2014, Chapter 12]. More precisely, we consider a linear parabolic partial differential equation driven by a deterministic rough path W of Hölder regularity α with ⅓ < α ≤ ½ . Based on a stochastic representation of the solution of the rough partial differential equation, we propose a regression Monte Carlo algorithm for spatio-temporal approximation of the solution. We provide a full convergence analysis of the proposed approximation method which essentially relies on the new bounds for the higher order derivatives of the solution in space. Finally, a comprehensive simulation study showing the applicability of the proposed algorithm is presented.

  • CH. Bayer, Ch.B. Hammouda, R. Tempone, Hierarchical adaptive sparse grids and quasi-Monte Carlo for option pricing under the rough Bergomi model, Quantitative Finance, published online on 20.04.2020, url, DOI 10.1080/14697688.2020.1744700 .
    The rough Bergomi (rBergomi) model, introduced recently in Bayer et al. [Pricing under rough volatility. Quant. Finance, 2016, 16(6), 887?904], is a promising rough volatility model in quantitative finance. It is a parsimonious model depending on only three parameters, and yet remarkably fits empirical implied volatility surfaces. In the absence of analytical European option pricing methods for the model, and due to the non-Markovian nature of the fractional driver, the prevalent option is to use the Monte Carlo (MC) simulation for pricing. Despite recent advances in the MC method in this context, pricing under the rBergomi model is still a time-consuming task. To overcome this issue, we have designed a novel, hierarchical approach, based on: (i) adaptive sparse grids quadrature (ASGQ), and (ii) quasi-Monte Carlo (QMC). Both techniques are coupled with a Brownian bridge construction and a Richardson extrapolation on the weak error. By uncovering the available regularity, our hierarchical methods demonstrate substantial computational gains with respect to the standard MC method. They reach a sufficiently small relative error tolerance in the price estimates across different parameter constellations, even for very small values of the Hurst parameter. Our work opens a new research direction in this field, i.e. to investigate the performance of methods other than Monte Carlo for pricing and calibrating under the rBergomi model.

  • CH. Bayer, R.F. Tempone , S. Wolfers, Pricing American options by exercise rate optimization, Quantitative Finance, published online on 07.07.2020, url, DOI 10.1080/14697688.2020.1750678 .
    We present a novel method for the numerical pricing of American options based on Monte Carlo simulation and the optimization of exercise strategies. Previous solutions to this problem either explicitly or implicitly determine so-called optimal exercise regions, which consist of points in time and space at which a given option is exercised. In contrast, our method determines the exercise rates of randomized exercise strategies. We show that the supremum of the corresponding stochastic optimization problem provides the correct option price. By integrating analytically over the random exercise decision, we obtain an objective function that is differentiable with respect to perturbations of the exercise rate even for finitely many sample paths. The global optimum of this function can be approached gradually when starting from a constant exercise rate. Numerical experiments on vanilla put options in the multivariate Black-Scholes model and a preliminary theoretical analysis underline the efficiency of our method, both with respect to the number of time-discretization steps and the required number of degrees of freedom in the parametrization of the exercise rates. Finally, we demonstrate the flexibility of our method through numerical experiments on max call options in the classical Black-Scholes model, and vanilla put options in both the Heston model and the non-Markovian rough Bergomi model.

  • P. Dvurechensky, E. Gorbunov, A. Gasnikov, An accelerated directional derivative method for smooth stochastic convex optimization, European Journal of Operational Research, 290 (2021), pp. 601--621 (published online on 20.08.2020), DOI 10.1016/j.ejor.2020.08.027 .
    We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivative of the objective function at this point and in this direction is available with some additive noise. The noise is assumed to be of an unknown nature, but bounded in the absolute value. We underline that we consider directional derivatives in any direction, as opposed to coordinate descent methods which use only derivatives in coordinate directions. For this setting, we propose a non-accelerated and an accelerated directional derivative method and provide their complexity bounds. Despite that our algorithms do not use gradient information, our non-accelerated algorithm has a complexity bound which is, up to a factor logarithmic in problem dimension, similar to the complexity bound of gradient-based algorithms. Our accelerated algorithm has a complexity bound which coincides with the complexity bound of the accelerated gradient-based algorithm up to a factor of square root of the problem dimension, whereas for existing directional derivative methods this factor is of the order of problem dimension. We also extend these results to strongly convex problems. Finally, we consider derivative-free optimization as a particular case of directional derivative optimization with noise in the directional derivative and obtain complexity bounds for non-accelerated and accelerated derivative-free methods. Complexity bounds for these algorithms inherit the gain in the dimension dependent factors from our directional derivative methods.

  • P. Friz, T. Nilssen, W. Stannat , Existence, uniqueness and stability of semi-linear rough partial differential equations, Journal of Differential Equations, 268 (2020), pp. 1686--1721, DOI 10.1016/j.jde.2019.09.033 .

  • P. Mathé, M.T. Nair, B. Hofmann, Regularization of linear ill-posed problems involving multiplication operators, Applicable Analysis. An International Journal, published online 28.04.2020, url, DOI 10.1080/00036811.2020.1758308 .
    We study regularization of ill-posed equations involving multiplication operators when the multiplier function is positive almost everywhere and zero is an accumulation point of the range of this function. Such equations naturally arise from equations based on non-compact self-adjoint operators in Hilbert space, after applying unitary transformations arising out of the spectral theorem. For classical regularization theory, when noisy observations are given and the noise is deterministic and bounded, then non-compactness of the ill-posed equations is a minor issue. However, for statistical ill-posed equations with non-compact operators less is known if the data are blurred by white noise. We develop a theory for spectral regularization with emphasis on this case. In this context, we highlight several aspects, in particular, we discuss the intrinsic degree of ill-posedness in terms of rearrangements of the multiplier function. Moreover, we address the required modifications of classical regularization schemes in order to be used for non-compact statistical problems, and we also introduce the concept of the effective ill-posedness of the operator equation under white noise. This study is concluded with prototypical examples for such equations, as these are deconvolution equations and certain final value problems in evolution equations.

  • J. Polzehl, K. Papafitsoros, K. Tabelow, Patch-wise adaptive weights smoothing in R, Journal of Statistical Software, 95 (2020), pp. 1--27, DOI 10.18637/jss.v095.i06 .
    Image reconstruction from noisy data has a long history of methodological development and is based on a variety of ideas. In this paper we introduce a new method called patch-wise adaptive smoothing, that extends the Propagation-Separation approach by using comparisons of local patches of image intensities to define local adaptive weighting schemes for an improved balance of reduced variability and bias in the reconstruction result. We present the implementation of the new method in an R package aws and demonstrate its properties on a number of examples in comparison with other state-of-the art image reconstruction methods.

Contributions to Collected Editions

  • D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, in: Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, 130 of Proceedings of Machine Learning Research, Journal of Machine Learning Research, 2021, pp. 1738--1746.

  • V. Dvurechensky, Data-driven confidence bands for distributed nonparametric regression, in: Proceedings of Thirty Third Conference on Learning Theory, J. Abernethy, S. Agarwal , eds., 125 of Proceedings of Machine Learning Research, PMLR, 2020, pp. 300--322.

  • N. Tupitsa, A. Gasnikov, P. Dvurechensky, S. Guminov, Strongly convex optimization for the dual formulation of optimal transport, in: Mathematical Optimization Theory and Operations Research, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., 1275 of Theoretical Computer Science and General Issues, Springer International Publishing AG, Cham, 2020, pp. 192--204, DOI 10.1007/978-3-030-58657-7_17 .
    In this paper we experimentally check a hypothesis, that dual problem to discrete entropy regularized optimal transport problem possesses strong convexity on a certain compact set. We present a numerical estimation technique of parameter of strong convexity and show that such an estimate increases the performance of an accelerated alternating minimization algorithm for strongly convex functions applied to the considered problem.

  • N. Tupitsa , P. Dvurechensky, A. Gasnikov, C.A. Uribe, Multimarginal optimal transport by accelerated alternating minimization, in: 2020 59th IEEE Conference on Decision and Control (CDC), IEEE, 2020, pp. 6132--6137, DOI 10.1109/CDC42340.2020.9304010 .

  • D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, V. Spokoiny, On the line-search gradient methods for stochastic optimization, in: Proceedings of the 21th IFAC World Congress, R. Findeisen, S. Hirche, K. Janschek, M. Mönnigmann, eds., 53 of IFAC PapersOnLine, Elsevier, 2020, pp. 1715--1720, DOI 10.1016/j.ifacol.2020.12.2284 .
    In this paper we propose several adaptive gradient methods for stochastic optimization. Our methods are based on Armijo-type line search and they simultaneously adapt to the unknown Lipschitz constant of the gradient and variance of the stochastic approximation for the gradient. We consider an accelerated gradient descent for convex problems and gradient descent for non-convex problems. In the experiments we demonstrate superiority of our methods to existing adaptive methods, e.g. AdaGrad and Adam.

  • P. Dvurechensky, A. Gasnikov, E. Nurminski, F. Stonyakin, Advances in low-memory subgradient optimization, in: Numerical Nonsmooth Optimization, A.M. Bagirov, M. Gaudioso, N. Karmitsa, M.M. Mäkelä, S. Taheri, eds., Springer International Publishing, Cham, 2020, pp. 19--59, DOI 10.1007/978-3-030-34910-3_2 .

  • P. Dvurechensky, A. Gasnikov, S. Omelchenko, A. Tiurin, A stable alternative to Sinkhorn's algorithm for regularized optimal transport, in: Mathematical Optimization Theory and Operations Research. MOTOR 2020, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., Lecture Notes in Computer Science, Springer, Cham, 2020, pp. 406--423, DOI 10.1007/978-3-030-49988-4_28 .

  • P. Dvurechensky, P. Ostroukhov, K. Safin, S. Shtern, M. Staudigl, Self-concordant analysis of Frank--Wolfe algorithms, in: Proceedings of the 37th International Conference on Machine Learning, H. Daumé Iii, A. Singh, eds., 119 of Proceedings of Machine Learning Research (online), 2020, pp. 2814--2824.
    Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.

Preprints, Reports, Technical Reports

  • A. Sadiev, A. Beznosikov, P. Dvurechensky, A. Gasnikov, Zeroth-order algorithms for smooth saddle-point problems, Preprint no. 2827, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2827 .
    Abstract, PDF (564 kByte)
    Saddle-point problems have recently gained an increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zeroth-order oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convex-concave saddle- point problems using zeroth-order oracles, and estimate their convergence rate and its dependence on the dimension n of the variable. In particular, our analysis shows that in the case when the feasible set is a direct product of two simplices, our convergence rate for the stochastic term is only by a log n factor worse than for the first-order methods. We also consider a mixed setup and develop 1/2th-order methods which use zeroth-order oracle for the minimization part and first-order oracle for the maximization part. Finally, we demonstrate the practical performance of our zeroth-order and 1/2th-order methods on practical problems.

  • CH. Bayer, M. Eigel, L. Sallandt, P. Trunschke, Pricing high-dimensional Bermudan options with hierarchical tensor formats, Preprint no. 2821, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2821 .
    Abstract, PDF (321 kByte)
    An efficient compression technique based on hierarchical tensors for popular option pricing methods is presented. It is shown that the “curse of dimensionality" can be alleviated for the computation of Bermudan option prices with the Monte Carlo least-squares approach as well as the dual martingale method, both using high-dimensional tensorized polynomial expansions. This discretization allows for a simple and computationally cheap evaluation of conditional expectations. Complexity estimates are provided as well as a description of the optimization procedures in the tensor train format. Numerical experiments illustrate the favourable accuracy of the proposed methods. The dynamical programming method yields results comparable to recent Neural Network based methods.

  • P.O. Petr Ostroukhov, R.K. Rinat Kamalov, P. Dvurechensky, A. Gasnikov, Tensor methods for strongly convex strongly concave saddle point problems and strongly monotone variational inequalities, Preprint no. 2820, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2820 .
    Abstract, PDF (302 kByte)
    In this paper we propose three tensor methods for strongly-convex-strongly-concave saddle point problems (SPP). The first method is based on the assumption of higher-order smoothness (the derivative of the order higher than 2 is Lipschitz-continuous) and achieves linear convergence rate. Under additional assumptions of first and second order smoothness of the objective we connect the first method with a locally superlinear converging algorithm in the literature and develop a second method with global convergence and local superlinear convergence. The third method is a modified version of the second method, but with the focus on making the gradient of the objective small. Since we treat SPP as a particular case of variational inequalities, we also propose two methods for strongly monotone variational inequalities with the same complexity as the described above.

  • A. Agafonov, D. Kamzolov, P. Dvurechensky, A. Gasnikov, Inexact tensor methods and their application to stochastic convex optimization, Preprint no. 2818, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2818 .
    Abstract, PDF (333 kByte)
    We propose a general non-accelerated tensor method under inexact information on higher- order derivatives, analyze its convergence rate, and provide sufficient conditions for this method to have similar complexity as the exact tensor method. As a corollary, we propose the first stochastic tensor method for convex optimization and obtain sufficient mini-batch sizes for each derivative.

  • V. Matyukhin, S. Kabanikhin, M. Shishlenin, N. Novikov, A. Vasin, A. Gasnikov, Convex optimization with inexact gradients in Hilbert space and applications to elliptic inverse problems, Preprint no. 2815, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2815 .
    Abstract, PDF (1346 kByte)
    In this paper we propose the gradient descent type methods to solve convex optimization problems in Hilbert space. We apply it to solve ill-posed Cauchy problem for Poisson equation and make a comparative analysis with Landweber iteration and steepest descent method. The theoretical novelty of the paper consists in the developing of new stopping rule for accelerated gradient methods with inexact gradient (additive noise). Note that up to the moment of stopping the method “doesn't feel the noise”. But after this moment the noise start to accumulate and the quality of the solution becomes worse for further iterations.

  • P. Friz, P. Hager, N. Tapia, Unified signature cumulants and generalized Magnus expansions, Preprint no. 2814, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2814 .
    Abstract, PDF (388 kByte)
    The signature of a path can be described as its full non-commutative exponential. Following T. Lyons we regard its expectation, the expected signature, as path space analogue of the classical moment generating function. The logarithm thereof, taken in the tensor algebra, defines the signature cumulant. We establish a universal functional relation in a general semimartingale context. Our work exhibits the importance of Magnus expansions in the algorithmic problem of computing expected signature cumulants, and further offers a far-reaching generalization of recent results on characteristic exponents dubbed diamond and cumulant expansions; with motivation ranging from financial mathematics to statistical physics. From an affine process perspective, the functional relation may be interpreted as infinite-dimensional, non-commutative (“Hausdorff") variation of Riccati's equation. Many examples are given.

  • N. Yudin, A. Gasnikov, Flexible modification of Gauss--Newton method and its stochastic extension, Preprint no. 2813, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2813 .
    Abstract, PDF (8261 kByte)
    This work presents a novel version of recently developed Gauss--Newton method for solving systems of nonlinear equations, based on upper bound of solution residual and quadratic regularization ideas. We obtained for such method global convergence bounds and under natural non-degeneracy assumptions we present local quadratic convergence results. We developed stochastic optimization algorithms for presented Gauss--Newton method and justified sub-linear and linear convergence rates for these algorithms using weak growth condition (WGC) and Polyak--Lojasiewicz (PL) inequality. We show that Gauss--Newton method in stochastic setting can effectively find solution under WGC and PL condition matching convergence rate of the deterministic optimization method. The suggested method unifies most practically used Gauss--Newton method modifications and can easily interpolate between them providing flexible and convenient method easily implementable using standard techniques of convex optimization.

  • A. Vasin, A. Gasnikov, V. Spokoiny, Stopping rules for accelerated gradient methods with additive noise in gradient, Preprint no. 2812, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2812 .
    Abstract, PDF (1129 kByte)
    In this article, we investigate an accelerated first-order method, namely, the method of similar triangles, which is optimal in the class of convex (strongly convex) problems with a Lipschitz gradient. The paper considers a model of additive noise in a gradient and a Euclidean prox- structure for not necessarily bounded sets. Convergence estimates are obtained in the case of strong convexity and its absence, and a stopping criterion is proposed for not strongly convex problems.

  • D. Belomestny, J.G.M. Schoenmakers, From optimal martingales to randomized dual optimal stopping, Preprint no. 2810, WIAS, Berlin, 2021.
    Abstract, PDF (571 kByte)
    In this article we study and classify optimal martingales in the dual formulation of optimal stopping problems. In this respect we distinguish between weakly optimal and surely optimal martingales. It is shown that the family of weakly optimal and surely optimal martingales may be quite large. On the other hand it is shown that the Doob-martingale, that is, the martingale part of the Snell envelope, is in a certain sense the most robust surely optimal martingale under random perturbations. This new insight leads to a novel randomized dual martingale minimization algorithm that does`nt require nested simulation. As a main feature, in a possibly large family of optimal martingales the algorithm efficiently selects a martingale that is as close as possible to the Doob martingale. As a result, one obtains the dual upper bound for the optimal stopping problem with low variance.

  • A. Neumann, N. Peitek, A. Brechmann, K. Tabelow, Th. Dickhaus, Utilizing anatomical information for signal detection in functional magnetic resonance imaging, Preprint no. 2806, WIAS, Berlin, 2021, DOI 10.20347/WIAS.PREPRINT.2806 .
    Abstract, PDF (2995 kByte)
    We are considering the statistical analysis of functional magnetic resonance imaging (fMRI) data. As demonstrated in previous work, grouping voxels into regions (of interest) and carrying out a multiple test for signal detection on the basis of these regions typically leads to a higher sensitivity when compared with voxel-wise multiple testing approaches. In the case of a multi-subject study, we propose to define the regions for each subject separately based on their individual brain anatomy, represented, e.g., by so-called Aparc labels. The aggregation of the subject-specific evidence for the presence of signals in the different regions is then performed by means of a combination function for p-values. We apply the proposed methodology to real fMRI data and demonstrate that our approach can perform comparably to a two-stage approach for which two independent experiments are needed, one for defining the regions and one for actual signal detection.

  • F. Besold, V. Spokoiny, Adaptive manifold clustering, Preprint no. 2800, WIAS, Berlin, 2020.
    Abstract, PDF (2013 kByte)
    Clustering methods seek to partition data such that elements are more similar to elements in the same cluster than to elements in different clusters. The main challenge in this task is the lack of a unified definition of a cluster, especially for high dimensional data. Different methods and approaches have been proposed to address this problem. This paper continues the study originated by [6] where a novel approach to adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was offered. The method allows analyzing high-dimensional data with an unknown number of unbalanced clusters of arbitrary shape under very weak modeling as-sumptions. The procedure demonstrates a state-of-the-art performance and is very efficient even for large data dimension D. However, the theoretical study in [6] is very limited and did not re-ally address the question of efficiency. This paper makes a significant step in understanding the remarkable performance of the AWC procedure, particularly in high dimension. The approach is based on combining the ideas of adaptive clustering and manifold learning. The manifold hypoth-esis means that high dimensional data can be well approximated by a d-dimensional manifold for small d helping to overcome the curse of dimensionality problem and to get sharp bounds on the cluster separation which only depend on the intrinsic dimension d. We also address the problem of parameter tuning. Our general theoretical results are illustrated by some numerical experiments.

  • J. Diehl, R. Preiss, M. Ruddy, N. Tapia, The moving frame method for iterated-integrals: Orthogonal invariants, Preprint no. 2796, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2796 .
    Abstract, PDF (286 kByte)
    We explore the algebraic properties of a generalized version of the iterated-sums signature, inspired by previous work of F. Kiraly and H. Oberhauser. In particular, we show how to recover the character property of the associated linear map over the tensor algebra by considering a deformed quasi-shuffle product of words on the latter. We introduce three non-linear transformations on iterated-sums signatures, close in spirit to Machine Learning applications, and show some of their properties.

  • J. Diehl, K. Ebrahimi-Fard, N. Tapia, Generalized iterated-sums signatures, Preprint no. 2795, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2795 .
    Abstract, PDF (152 kByte)
    We explore the algebraic properties of a generalized version of the iterated-sums signature, inspired by previous work of F. Király and H. Oberhauser. In particular, we show how to recover the character property of the associated linear map over the tensor algebra by considering a deformed quasi-shuffle product of words on the latter. We introduce three non-linear transformations on iterated-sums signatures, close in spirit to Machine Learning applications, and show some of their properties.

  • CH. Bayer, D. Belomestny, P. Hager, P. Pigato, J.G.M. Schoenmakers, V. Spokoiny, Reinforced optimal control, Preprint no. 2792, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2792 .
    Abstract, PDF (591 kByte)
    Least squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay, Commun. Math. Sci., 18(1):109?121, 2020] proposes to reinforce the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems, while considerably improving the method?s efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis.

  • CH. Bayer, P. Hager, S. Riedel, J.G.M. Schoenmakers, Optimal stopping with signatures, Preprint no. 2790, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2790 .
    Abstract, PDF (407 kByte)
    We propose a new method for solving optimal stopping problems (such as American option pricing in finance) under minimal assumptions on the underlying stochastic process. We consider classic and randomized stopping times represented by linear functionals of the associated rough path signature, and prove that maximizing over the class of signature stopping times, in fact, solves the original optimal stopping problem. Using the algebraic properties of the signature, we can then recast the problem as a (deterministic) optimization problem depending only on the (truncated) expected signature. The only assumption on the process is that it is a continuous (geometric) random rough path. Hence, the theory encompasses processes such as fractional Brownian motion which fail to be either semi-martingales or Markov processes.

  • A. Kroshnin, V. Spokoiny, A. Suvorikova, Statistical inference for Bures--Wasserstein barycenters, Preprint no. 2788, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2788 .
    Abstract, PDF (1745 kByte)
    In this work we introduce the concept of Bures--Wasserstein barycenter $Q_*$, that is essentially a Fréchet mean of some distribution $P$ supported on a subspace of positive semi-definite $d$-dimensional Hermitian operators $H_+(d)$. We allow a barycenter to be constrained to some affine subspace of $H_+(d)$, and we provide conditions ensuring its existence and uniqueness. We also investigate convergence and concentration properties of an empirical counterpart of $Q_*$ in both Frobenius norm and Bures--Wasserstein distance, and explain, how the obtained results are connected to optimal transportation theory and can be applied to statistical inference in quantum mechanics.

  • R. Krawchenko, C.A. Uribe, A. Gasnikov, P. Dvurechensky, Distributed optimization with quantization for computing Wasserstein barycenters, Preprint no. 2782, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2782 .
    Abstract, PDF (946 kByte)
    We study the problem of the decentralized computation of entropy-regularized semi-discrete Wasserstein barycenters over a network. Building upon recent primal-dual approaches, we propose a sampling gradient quantization scheme that allows efficient communication and computation of approximate barycenters where the factor distributions are stored distributedly on arbitrary networks. The communication and algorithmic complexity of the proposed algorithm are shown, with explicit dependency on the size of the support, the number of distributions, and the desired accuracy. Numerical results validate our algorithmic analysis.

  • J. Diehl, K. Ebrahimi-Fard, N. Tapia, Tropical time series, iterated-sum signatures and quasisymmetric functions, Preprint no. 2760, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2760 .
    Abstract, PDF (236 kByte)
    Driven by the need for principled extraction of features from time series, we introduce the iterated-sums signature over any commutative semiring. The case of the tropical semiring is a central, and our motivating, example, as it leads to features of (real-valued) time series that are not easily available using existing signature-type objects.

  • CH. Bayer, F. Harang, P. Pigato, Log-modulated rough stochastic volatility models, Preprint no. 2752, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2752 .
    Abstract, PDF (595 kByte)
    We propose a new class of rough stochastic volatility models obtained by modulating the power-law kernel defining the fractional Brownian motion (fBm) by a logarithmic term, such that the kernel retains square integrability even in the limit case of vanishing Hurst index H. The so-obtained log-modulated fractional Brownian motion (log-fBm) is a continuous Gaussian process even for H = 0. As a consequence, the resulting super-rough stochastic volatility models can be analysed over the whole range of Hurst indices between 0 and 1/2, including H = 0, without the need of further normalization. We obtain the usual power law explosion of the skew as maturity T goes to 0, modulated by a logarithmic term, so no flattening of the skew occurs as H goes to 0.

  • CH. Bayer, J. Qiu, Y. Yao, Pricing options under rough volatility with backward SPDEs, Preprint no. 2745, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2745 .
    Abstract, PDF (553 kByte)
    In this paper, we study the option pricing problems for rough volatility models. As the framework is non-Markovian, the value function for a European option is not deterministic; rather, it is random and satisfies a backward stochastic partial differential equation (BSPDE). The existence and uniqueness of weak solutions is proved for general nonlinear BSPDEs with unbounded random leading coefficients whose connections with certain forward-backward stochastic differential equations are derived as well. These BSPDEs are then used to approximate American option prices. A deep learning-based method is also investigated for the numerical approximations to such BSPDEs and associated non-Markovian pricing problems. Finally, the examples of rough Bergomi type are numerically computed for both European and American options.

  • J. Diehl, K. Ebrahimi-Fard, N. Tapia, Iterated-sums signature, quasi-symmetric functions and time series analysis, Preprint no. 2736, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2736 .
    Abstract, PDF (261 kByte)
    We survey and extend results on a recently defined character on the quasi-shuffle algebra. This character, termed iterated-sums signature, appears in the context of time series analysis and originates from a problem in dynamic time warping. Algebraically, it relates to (multidimensional) quasisymmetric functions as well as (deformed) quasi-shuffle algebras.

  • S. Riedel, Semi-implicit Taylor schemes for stiff rough differential equations, Preprint no. 2734, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2734 .
    Abstract, PDF (538 kByte)
    We study a class of semi-implicit Taylor-type numerical methods that are easy to implement and designed to solve multidimensional stochastic differential equations driven by a general rough noise, e.g. a fractional Brownian motion. In the multiplicative noise case, the equation is understood as a rough differential equation in the sense of T. Lyons. We focus on equations for which the drift coefficient may be unbounded and satisfies a one-sided Lipschitz condition only. We prove well-posedness of the methods, provide a full analysis, and deduce their convergence rate. Numerical experiments show that our schemes are particularly useful in the case of stiff rough stochastic differential equations driven by a fractional Brownian motion.

  • CH. Bayer, P. Friz, N. Tapia, Stability of deep neural networks via discrete rough paths, Preprint no. 2732, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2732 .
    Abstract, PDF (365 kByte)
    Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks. In particular we derive stability bounds in terms of the total p-variation of trained weights for any p ≥ 1.

  • V. Avanesov, Data-driven confidence bands for distributed nonparametric regression, Preprint no. 2729, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2729 .
    Abstract, PDF (316 kByte)
    Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.

  • R.J.A. Laeven, J.G.M. Schoenmakers, N.F.F. Schweizer, M. Stadje, Robust multiple stopping -- A path-wise duality approach, Preprint no. 2728, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2728 .
    Abstract, PDF (576 kByte)
    In this paper we develop a solution method for general optimal stopping problems. Our general setting allows for multiple exercise rights, i.e., optimal multiple stopping, for a robust evaluation that accounts for model uncertainty, and for general reward processes driven by multi-dimensional jump-diffusions. Our approach relies on first establishing robust martingale dual representation results for the multiple stopping problem which satisfy appealing path-wise optimality (almost sure) properties. Next, we exploit these theoretical results to develop upper and lower bounds which, as we formally show, not only converge to the true solution asymptotically, but also constitute genuine upper and lower bounds. We illustrate the applicability of our general approach in a few examples and analyze the impact of model uncertainty on optimal multiple stopping strategies.

  • A. Ivanova, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, A. Tyurin, E. Vorontsova, D. Pasechnyuk, Oracle complexity separation in convex optimization, Preprint no. 2711, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2711 .
    Abstract, PDF (424 kByte)
    Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part.

  • D. Kamzolov, A. Gasnikov, P. Dvurechensky, On the optimal combination of tensor optimization methods, Preprint no. 2710, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2710 .
    Abstract, PDF (284 kByte)
    We consider the minimization problem of a sum of a number of functions having Lipshitz p -th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order.

  • F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh, S. Artamonov, V. Piskunova, Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model, Preprint no. 2709, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2709 .
    Abstract, PDF (463 kByte)
    In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal conditional gradient method and universal method for variational inequalities with composite structure. These method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. As a particular case of our general framework, we introduce relative smoothness for operators and propose an algorithm for VIs with such operator. We also generalize our framework for relatively strongly convex objectives and strongly monotone variational inequalities.

  • M. Redmann, S. Riedel, Runge--Kutta methods for rough differential equations, Preprint no. 2708, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2708 .
    Abstract, PDF (393 kByte)
    We study Runge-Kutta methods for rough differential equations which can be used to calculate solutions to stochastic differential equations driven by processes that are rougher than a Brownian motion. We use a Taylor series representation (B-series) for both the numerical scheme and the solution of the rough differential equation in order to determine conditions that guarantee the desired order of the local error for the underlying Runge-Kutta method. Subsequently, we prove the order of the global error given the local rate. In addition, we simplify the numerical approximation by introducing a Runge-Kutta scheme that is based on the increments of the driver of the rough differential equation. This simplified method can be easily implemented and is computational cheap since it is derivative-free. We provide a full characterization of this implementable Runge-Kutta method meaning that we provide necessary and sufficient algebraic conditions for an optimal order of convergence in case that the driver, e.g., is a fractional Brownian motion with Hurst index 1/4 < H ≤ 1/2. We conclude this paper by conducting numerical experiments verifying the theoretical rate of convergence.

  • M. Redmann, Ch. Bayer, P. Goya, Low-dimensional approximations of high-dimensional asset price models, Preprint no. 2706, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2706 .
    Abstract, PDF (363 kByte)
    We consider high-dimensional asset price models that are reduced in their dimension in order to reduce the complexity of the problem or the effect of the curse of dimensionality in the context of option pricing. We apply model order reduction (MOR) to obtain a reduced system. MOR has been previously studied for asymptotically stable controlled stochastic systems with zero initial conditions. However, stochastic differential equations modeling price processes are uncontrolled, have non-zero initial states and are often unstable. Therefore, we extend MOR schemes and combine ideas of techniques known for deterministic systems. This leads to a method providing a good pathwise approximation. After explaining the reduction procedure, the error of the approximation is analyzed and the performance of the algorithm is shown conducting several numerical experiments. Within the numerics section, the benefit of the algorithm in the context of option pricing is pointed out.

  • M. Ghani Varzaneh, S. Riedel, A dynamical theory for singular stochastic delay differential equations II: Nonlinear equations and invariant manifolds, Preprint no. 2701, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2701 .
    Abstract, PDF (383 kByte)
    Building on results obtained in [GVRS], we prove Local Stable and Unstable Manifold Theorems for nonlinear, singular stochastic delay differential equations. The main tools are rough paths theory and a semi-invertible Multiplicative Ergodic Theorem for cocycles acting on measurable fields of Banach spaces obtained in [GVR].

  • CH. Bayer, D. Belomestny, P. Hager, P. Pigato, J.G.M. Schoenmakers, Randomized optimal stopping algorithms and their convergence analysis, Preprint no. 2697, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2697 .
    Abstract, PDF (367 kByte)
    In this paper we study randomized optimal stopping problems and consider corresponding forward and backward Monte Carlo based optimization algorithms. In particular we prove the convergence of the proposed algorithms and derive the corresponding convergence rates.

  • C. Bellinger, A. Djurdjevac, P. Friz, N. Tapia, Transport and continuity equations with (very) rough noise, Preprint no. 2696, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2696 .
    Abstract, PDF (320 kByte)
    Existence and uniqueness for rough flows, transport and continuity equations driven by general geometric rough paths are established.

  • S. Guminov, P. Dvurechensky, A. Gasnikov, On accelerated alternating minimization, Preprint no. 2695, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2695 .
    Abstract, PDF (343 kByte)
    Alternating minimization (AM) optimization algorithms have been known for a long time and are of importance in machine learning problems, among which we are mostly motivated by approximating optimal transport distances. AM algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly or cheaply with high accuracy. The ubiquitous Sinkhorn's algorithm can be seen as an alternating minimization algorithm for the dual to the entropy-regularized optimal transport problem. We introduce an accelerated alternating minimization method with a $1/k^2$ convergence rate, where $k$ is the iteration counter. This improves over known bound $1/k$ for general AM methods and for the Sinkhorn's algorithm. Moreover, our algorithm converges faster than gradient-type methods in practice as it is free of the choice of the step-size and is adaptive to the local smoothness of the problem. We show that the proposed method is primal-dual, meaning that if we apply it to a dual problem, we can reconstruct the solution of the primal problem with the same convergence rate. We apply our method to the entropy regularized optimal transport problem and show experimentally, that it outperforms Sinkhorn's algorithm.

  • P. Dvurechensky, A. Gasnikov, P. Ostroukhov, A.C. Uribe, A. Ivanova, Near-optimal tensor methods for minimizing gradient norm, Preprint no. 2694, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2694 .
    Abstract, PDF (423 kByte)
    Motivated by convex problems with linear constraints and, in particular, by entropy-regularized optimal transport, we consider the problem of finding approximate stationary points, i.e. points with the norm of the objective gradient less than small error, of convex functions with Lipschitz p-th order derivatives. Lower complexity bounds for this problem were recently proposed in [Grapiglia and Nesterov, arXiv:1907.07053]. However, the methods presented in the same paper do not have optimal complexity bounds. We propose two optimal up to logarithmic factors methods with complexity bounds with respect to the initial objective residual and the distance between the starting point and solution respectively

  • P. Dvurechensky, M. Staudigl, C.A. Uribe , Generalized self-concordant Hessian-barrier algorithms, Preprint no. 2693, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2693 .
    Abstract, PDF (647 kByte)
    Many problems in statistical learning, imaging, and computer vision involve the optimization of a non-convex objective function with singularities at the boundary of the feasible set. For such challenging instances, we develop a new interior-point technique building on the Hessian-barrier algorithm recently introduced in Bomze, Mertikopoulos, Schachinger and Staudigl, [SIAM J. Opt. 2019 29(3), pp. 2100-2127], where the Riemannian metric is induced by a generalized selfconcordant function. This class of functions is sufficiently general to include most of the commonly used barrier functions in the literature of interior point methods. We prove global convergence to an approximate stationary point of the method, and in cases where the feasible set admits an easily computable self-concordant barrier, we verify worst-case optimal iteration complexity of the method. Applications in non-convex statistical estimation and Lp-minimization are discussed to given the efficiency of the method.

  • E. Gorbunov, D. Dvinskikh, A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization, Preprint no. 2691, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2691 .
    Abstract, PDF (616 kByte)
    We consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case we use special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex smooth and strongly convex smooth objective functions and methods based on Gradient Sliding algorithm to solve the same problems in the non-smooth case. We prove the convergence guarantees in smooth convex case with deterministic first-order oracle. We propose and analyze three novel methods to handle stochastic convex optimization problems with affine constraints: SPDSTM, R-RRMA-AC-SA and SSTM_sc. All methods use stochastic dual oracle. SPDSTM is the stochastic primal-dual modification of STM and it is applied for the dual problem when the primal functional is strongly convex and Lipschitz continuous on some ball. R-RRMA-AC-SA is an accelerated stochastic method based on the restarts of RRMA-AC-SA and SSTM_sc is just stochastic STM for strongly convex problems. Both methods are applied to the dual problem when the primal functional is strongly convex, smooth and Lipschitz continuous on some ball and use stochastic dual first-order oracle. We develop convergence analysis for these methods for the unbiased and biased oracles respectively. Finally, we apply all aforementioned results and approaches to solve decentralized distributed optimization problem and discuss optimality of the obtained results in terms of communication rounds and number of oracle calls per node.

  • F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C.A. Uribe, D. Pasechnyuk, S. Artamonov, Gradient methods for problems with inexact model of the objective, Preprint no. 2688, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2688 .
    Abstract, PDF (785 kByte)
    We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [19] and relative smoothness condition [43]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [49]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.

  • V. Avanesov, Nonparametric change point detection in regression, Preprint no. 2687, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2687 .
    Abstract, PDF (329 kByte)
    This paper considers the prominent problem of change-point detection in regression. The study suggests a novel testing procedure featuring a fully data-driven calibration scheme. The method is essentially a black box, requiring no tuning from the practitioner. The approach is investigated from both theoretical and practical points of view. The theoretical study demonstrates proper control of first-type error rate under H0 and power approaching 1 under H1. The experiments conducted on synthetic data fully support the theoretical claims. In conclusion, the method is applied to financial data, where it detects sensible change-points. Techniques for change-point localization are also suggested and investigated

  • V. Avanesov, How to gamble with non-stationary X-armed bandits and have no regrets, Preprint no. 2686, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2686 .
    Abstract, PDF (287 kByte)
    In X-armed bandit problem an agent sequentially interacts with environment which yields a reward based on the vector input the agent provides. The agent's goal is to maximise the sum of these rewards across some number of time steps. The problem and its variations have been a subject of numerous studies, suggesting sub-linear and sometimes optimal strategies. The given paper introduces a new variation of the problem. We consider an environment, which can abruptly change its behaviour an unknown number of times. To that end we propose a novel strategy and prove it attains sub-linear cumulative regret. Moreover, the obtained regret bound matches the best known bound for GP-UCB for a stationary case, and approaches the minimax lower bound in case of highly smooth relation between an action and the corresponding reward. The theoretical result is supported by experimental study.

  • F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh, V. Piskunova, Inexact model: A framework for optimization and variational inequalities, Preprint no. 2679, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2679 .
    Abstract, PDF (414 kByte)
    In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities.

  • K. Ebrahimi-Fard, F. Patras, N. Tapia, L. Zambotti, Wick polynomials in non-commutative probability: A group-theoretical approach, Preprint no. 2677, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2677 .
    Abstract, PDF (292 kByte)
    Wick polynomials and Wick products are studied in the context of non-commutative probability theory. It is shown that free, boolean and conditionally free Wick polynomials can be defined and related through the action of the group of characters over a particular Hopf algebra. These results generalize our previous developments of a Hopf algebraic approach to cumulants and Wick products in classical probability theory.

  • P. Dvurechensky, A. Gasnikov, E. Nurminski, F. Stonyakin, Advances in low-memory subgradient optimization, Preprint no. 2676, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2676 .
    Abstract, PDF (413 kByte)
    One of the main goals in the development of non-smooth optimization is to cope with high dimensional problems by decomposition, duality or Lagrangian relaxation which greatly reduces the number of variables at the cost of worsening differentiability of objective or constraints. Small or medium dimensionality of resulting non-smooth problems allows to use bundle-type algorithms to achieve higher rates of convergence and obtain higher accuracy, which of course came at the cost of additional memory requirements, typically of the order of n2, where n is the number of variables of non-smooth problem. However with the rapid development of more and more sophisticated models in industry, economy, finance, et all such memory requirements are becoming too hard to satisfy. It raised the interest in subgradient-based low-memory algorithms and later developments in this area significantly improved over their early variants still preserving O(n) memory requirements. To review these developments this chapter is devoted to the black-box subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of N.Z. Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds for smooth and non-smooth convex and quasi-convex optimization problems are briefly exposed in what follows to introduce to the relevant fundamentals of non-smooth optimization. Special attention in this section is given to the adaptive step-size policy which aims to attain lowest complexity bounds. Unfortunately the non-differentiability of objective function in convex optimization essentially slows down the theoretical low bounds for the rate of convergence in subgradient optimization compared to the smooth case but there are different modern techniques that allow to solve non-smooth convex optimization problems faster then dictate lower complexity bounds. In this work the particular attention is given to Nesterov smoothing technique, Nesterov Universal approach, and Legendre (saddle point) representation approach. The new results on Universal Mirror Prox algorithms represent the original parts of the survey. To demonstrate application of non-smooth convex optimization algorithms for solution of huge-scale extremal problems we consider convex optimization problems with non-smooth functional constraints and propose two adaptive Mirror Descent methods. The first method is of primal-dual variety and proved to be optimal in terms of lower oracle bounds for the class of Lipschitz-continuous convex objective and constraints. The advantages of application of this method to sparse Truss Topology Design problem are discussed in certain details. The second method can be applied for solution of convex and quasi-convex optimization problems and is optimal in a sense of complexity bounds. The conclusion part of the survey contains the important references that characterize recent developments of non-smooth convex optimization.

Talks, Poster

  • O. Butkovsky, Skew fractional Brownian motion: Going beyond the Catellier-Gubinelli setting (online talk), 14th Oxford-Berlin Young Researchers Meeting on Applied Stochastic Analysis (Online Event), February 10 - 12, 2021, University of Oxford, Mathematical Institute, UK, February 11, 2021.

  • N. Tapia, Approximation of controlled rough paths (online talk), 14th Oxford-Berlin Young Researchers Meeting on Applied Stochastic Analysis (Online Event), February 10 - 12, 2021, University of Oxford, Mathematical Institute, UK, February 10, 2021.

  • N. Tapia, Unified signature cumulants and generalized magnus expansions, Analyse stochastique trajectorielle et applications Pathwise Stochastic Analysis and Applications (Online Event), March 8 - 12, 2021, C.I.R.M., France, March 11, 2021.

  • D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, The 24th International Conference on Artificial Intelligence and Statistics (Online Event), April 13 - 15, 2021.

  • A. Suvorikova, Optimal transport in machine learning (onlne talk), HDI Lab Seminar, HSE University, Faculty of Computer Science (Online Event), Moskau, Russian Federation, April 29, 2021.

  • CH. Bayer, A pricing BSPDE for rough volatility (online talk), MATH4UQ Seminar (Online Event), RWTH Aachen University, Mathematics for Uncertainty Quantification, April 6, 2021.

  • P. Dvurechensky, Accelerated gradient methods and their applications to Wasserstein barycenter problem, TheXIII international scientific conference and young scientist school``Theory and Numerics of Inverse and Ill-posed Problems", April 12 - 22, 2021, Mathematical Center in Akademgorodok, Novosibirsk, Russian Federation, April 14, 2021.

  • P. Friz, Rough stochastic differential equations (online talk), Analyse stochastique trajectorielle et applications Pathwise Stochastic Analysis and Applications (Online Event), March 8 - 12, 2021, C.I.R.M., France, March 8, 2021.

  • K. Papafitsoros, Optimization with learning-informed differential equation constraints and its applications (online talk), Seminar Modern Methods in Applied Stochastics and Nonparametric Statistics, WIAS Berlin, March 16, 2021.

  • V. Spokoiny, Bayesian inference in Bernoulli model with application to ranking from pairwise comparison (online talk), Data Seminar, Université Grenoble Alpes, Laboratoire Jean Kuntzmann, France, January 21, 2021.

  • K. Tabelow, MaRDI - The mathematical research data Initiative within the NFDI (online talk), SFB 1294 Colloquium (Online Event), Universität Potsdam, Institut für Mathematik, April 16, 2021.

  • V. Avanesov, Data-driven confidence bands for distributed nonparametric regression (online talk), The 33rd Annual Conference on Learning Theory (COLT 2020) (Online Event), July 9 - 12, 2020, Graz, Austria, July 10, 2020.

  • O. Butkovsky, Exponential ergodicity of order-preserving SPDEs in the hypoelliptic setting via new coupling techniques (online talk), Bernoulli-IMS One World Symposium 2020 (Online Event), August 24 - 28, 2020, August 25, 2020.

  • O. Butkovsky, New coupling techniques for exponential ergodicity of SPDEs in the hypoelliptic and effectively elliptic settings (online talk), Probability Seminar, Heriot-Watt University, Edinburgh, UK, October 21, 2020.

  • O. Butkovsky, Regularization by noise for PDEs - a stochastic sewing approach (online talk), FOR 2402 Seminar (Online Event), Technische Universtität Berlin, Institut für Mathematik, November 12, 2020.

  • O. Butkovsky, Regularization by noise for SDEs and related systems: A tale of two approaches, Eighth Bielefeld-SNU joint Workshop in Mathematics, February 24 - 26, 2020, Universität Bielefeld, Fakultät für Mathematik, February 24, 2020.

  • O. Butkovsky, Skew stochastic heat equation and regularization by noise for PDEs (online talk), Bernoulli-IMS One World Symposium 2020 (Online Event), August 24 - 28, 2020, August 27, 2020.

  • S. Riedel, Optimal stopping: a signature approach (online talk), 13th Annual ERC Berlin-Oxford Young Researchers Meeting on Applied Stochastic Analysis, WIAS Berlin, June 9, 2020.

  • S. Riedel, Runge--Kutta methods for rough differential equations (online talk), The DNA Seminar (spring 2020), Norwegian University of Science and Technology, Department of Mathematical Sciences, Trondheim, Norway, June 24, 2020.

  • N. Tapia, Free Wick polynomials (online talk), Arbeitsgruppenseminar Analysis, Universität Potsdam, Institut für Mathematik, April 24, 2020.

  • N. Tapia, Higher order iterated-sums signatures (online talk), DataSig Seminar, University of Oxford, Mathematical Institute, UK, April 2, 2020.

  • N. Tapia, Transport and continuity equations with rough noise (online talk), DNA Seminar, Norwegian University of Science and Technology, Department of Mathematical Sciences, Trondheim, Norway, April 22, 2020.

  • N. Tapia, Transport equations with low regularity rough noise, Young researchers between geometry and stochastic analysis, February 12 - 14, 2020, University of Bergen, Norway, February 13, 2020.

  • N. Tapia , Signatures in shape analysis (online talk), Geometry of curves in time series and shape analysis (Online Event), August 11 - 14, 2020, Max-Planck-Institut für Mathematik in den Naturwissenschaften (MiS), Leipzig, August 14, 2020.

  • A. Kroshnin, A. Suvorikova, V. Spokoiny, Statistical inference on Bures-Wasserstein space: From theory to practice, Math of Machine Learning 2020, Sotschi, Russian Federation, February 19 - 22, 2020.

  • D. Dvinskikh, A. Gasnikov, SA vs SAA for population Wasserstein barycenter calculation, Math of Machine Learning 2020, Sotschi, Russian Federation, February 19 - 22, 2020.

  • D. Dvinskikh, A. Gasnikov, Two approaches for population Wasserstein barycenter problem: Stochastic averaging versus sample average approximation (online talks), XII Summer School on Operations Research, Data and Decision Making, May 19 - 21, 2020, Faculty of Informatics, Mathematics and Computer Science, Nizhny Novgorod, Russian Federation, May 20, 2020.

  • A. Suvorikova, Change point detection in hight-dimensional data (online talk), Joint Aramco-HSE Reserach Seminar, Higher School of Economics, Faculty of Computer Science, Moscow, Russian Federation, April 15, 2020.

  • A. Suvorikova, Optimal transport and its application in bioinformatics, Second HSE-Yandex autumn school on generative models, November 24 - 27, 2020, National Research University Higher School of Economics, Moskau, Russian Federation, November 27, 2020.

  • A. Suvorikova, Shape-based domain Adaptation, Statistical Seminar of HDI Lab, Higher School of Economics, Faculty of Computer Science, Moscow, Russian Federation, March 2, 2020.

  • A. Suvorikova, Shape-based domain adaptation via optimal transportation (online talk), Machine Learning Online Seminar, Max-Planck-Institut für Mathematik in den Naturwissenschaften (MiS), Leipzig, April 1, 2020.

  • A. Suvorikova, User-friendly optimal transport (online talk), Chromatine Seminar (Online Event), Skolkovo Institute of Science and Technology (Skoltech), Moskau, Russian Federation, August 11, 2020.

  • CH. Bayer, Pricing american options by exercise rate optimization, Research Seminar on Insurance Mathematics and Stochastic Finance, Eidgenössische Technische Hochschule Zürich, Switzerland, January 9, 2020.

  • CH. Bayer, Pricing american options by exercise rate optimization, Mathrisk- INRIA/ LPSM Paris-Diderot Seminar, Inria Paris Research Centre, Mathrisk Research Team, France, February 6, 2020.

  • CH. Bayer, Pricing american options by exercise rate optimization, Lunch at the Lab, University of Calgary, Department of Mathematics and Statistics, Canada, March 3, 2020.

  • CH. Bayer, Rough volatility, Summer School 2020, September 14 - 17, 2020, International Research Training Group (IRTG) 2544, Döllnsee, September 15, 2020.

  • P. Dvurechensky, Distributed optimization for Wasserstein barycenters, Workshop on PDE Constrained Optimization under Uncertainty and Mean Field Games, January 28 - 29, 2020, WIAS Berlin, January 29, 2020.

  • P. Dvurechensky, Distributed optimization for Wasserstein barycenters (online talk), INFORMS Telecommunication and Network Analytics Conference 2020 (Online Event), October 20 - 21, 2020, INFORMS Telecommunications & Networks Analytics, USA, October 21, 2020.

  • P. Friz, Diamonds, Cumulants, Signature cumulants and Magnus expansion (online talk), Geometry of curves in time series and shape analysis (Online Event), August 11 - 14, 2020, Max-Planck-Institut für Mathematik in den Naturwissenschaften (MiS), Leipzig, August 14, 2020.

  • TH. Koprucki, K. Tabelow, T. Streckenbach, T. Niermann, A. Maltsi, Model-based geometry reconstruction of TEM images (online poster session), MATH+ Day 2020 (Online Event), November 6, 2020.

  • P. Mathé, Bayesian inverse problems with non-commuting operators, University of Edinburgh, School of Mathematics, UK, February 14, 2020.

  • V. Spokoiny, Advanced Statistical Methods, February 11 - March 3, 2020, Higher School of Economics, Faculty of Computer Science, Moskau, Russian Federation.

  • V. Spokoiny, Bayes inference for non-linear inverse problems, Statistics meets Machine Learning, January 26 - 31, 2020, Mathematisches Forschungsinstitut Oberwolfach (MFO), January 28, 2020.

  • K. Tabelow, MaRDI: Mathematical research data initiative, Leibniz NFDI-Konferenz (Online Event), August 19, 2020.

External Preprints

  • M. Coghi, W. Dreyer, P. Gajewski, C. Guhlke, P. Friz, M. Maurelli, A McKean--Vlasov SDE and particle system with interactionfrom reflecting boundaries, Preprint no. 2102.12315v1/1--2102.12315v1/37, Cornell University Library,, 2021.

  • A. Agafonov, P. Dvurechensky, G. Scutari, A. Gasnikov, D. Kamzolov, A. Lukashevich, A. Daneshmand, An accelerated second-order method for distributed stochastic optimization, Preprint no. arXiv:2103.14392, Cornell University Library,, 2021.
    We consider distributed stochastic optimization problems that are solved with master/workers computation architecture. Statistical arguments allow to exploit statistical similarity and approximate this problem by a finite-sum problem, for which we propose an inexact accelerated cubic-regularized Newton's method that achieves lower communication complexity bound for this setting and improves upon existing upper bound. We further exploit this algorithm to obtain convergence rate bounds for the original stochastic optimization problem and compare our bounds with the existing bounds in several regimes when the goal is to minimize the number of communication rounds and increase the parallelization by increasing the number of workers.

  • A. Daneshmand, G. Scutari, P. Dvurechensky, A. Gasnikov, Newton method over networks is fast up to the statistical precision, Preprint no. arXiv:2102.06780, Cornell University, 2021.

  • E. Gladin, A. Sadiev, A. Gasnikov, P. Dvurechensky, A. Beznosikov, M. Alkousa, Solving smooth min-min and min-max problems by mixed oracle algorithms, Preprint no. arXiv:2103.00434, Cornell University, 2021.
    In this paper, we consider two types of problems that have some similarity in their structure, namely, min-min problems and min-max saddle-point problems. Our approach is based on considering the outer minimization problem as a minimization problem with inexact oracle. This inexact oracle is calculated via inexact solution of the inner problem, which is either minimization or a maximization problem. Our main assumptions are that the problem is smooth and the available oracle is mixed: it is only possible to evaluate the gradient w.r.t. the outer block of variables which corresponds to the outer minimization problem, whereas for the inner problem only zeroth-order oracle is available. To solve the inner problem we use accelerated gradient-free method with zeroth-order oracle. To solve the outer problem we use either inexact variant of Vaydya's cutting-plane method or a variant of accelerated gradient method. As a result, we propose a framework that leads to non-asymptotic complexity bounds for both min-min and min-max problems. Moreover, we estimate separately the number of first- and zeroth-order oracle calls which are sufficient to reach any desired accuracy.

  • V. Novitskii, A. Gasnikov, Improved exploiting higher order smoothness in derivative-free optimization and continuous bandit, Preprint no. arXiv:2101.03821, Cornell University, 2021.

  • A. Rogozin, A. Beznosikov, D. Dvinskikh, D. Kovalev, P. Dvurechensky, A. Gasnikov, Decentralized distributed optimization for saddle point problems, Preprint no. arXiv:2102.07758, Cornell University, 2021.

  • A. Rogozin, M. Bochko, P. Dvurechensky, A. Gasnikov, V. Lukoshkin, An accelerated method for decentralized distributed stochastic optimization over time-varying graphs, Preprint no. arXiv:2103.15598, Cornell University Library,, 2021.
    We consider a distributed stochastic optimization problem that is solved by a decentralized network of agents with only local communication between neighboring agents. The goal of the whole system is to minimize a global objective function given as a sum of local objectives held by each agent. Each local objective is defined as an expectation of a convex smooth random function and the agent is allowed to sample stochastic gradients for this function. For this setting we propose the first accelerated (in the sense of Nesterov's acceleration) method that simultaneously attains optimal up to a logarithmic factor communication and oracle complexity bounds for smooth strongly convex distributed stochastic optimization. We also consider the case when the communication graph is allowed to vary with time and obtain complexity bounds for our algorithm, which are the first upper complexity bounds for this setting in the literature.

  • V. Tominin , Y. Tominin , E. Borodich , D. Kovalev, A. Gasnikov, P. Dvurechensky, On accelerated methods for saddle-point problems with composite structure, Preprint no. arXiv:2103.09344, Cornell University, 2021.
    We consider strongly-convex-strongly-concave saddle-point problems with general non-bilinear objective and different condition numbers with respect to the primal and the dual variables. First, we consider such problems with smooth composite terms, one of which having finite-sum structure. For this setting we propose a variance reduction algorithm with complexity estimates superior to the existing bounds in the literature. Second, we consider finite-sum saddle-point problems with composite terms and propose several algorithms depending on the properties of the composite terms. When the composite terms are smooth we obtain better complexity bounds than the ones in the literature, including the bounds of a recently proposed nearly-optimal algorithms which do not consider the composite structure of the problem. If the composite terms are prox-friendly, we propose a variance reduction algorithm that, on the one hand, is accelerated compared to existing variance reduction algorithms and, on the other hand, provides in the composite setting similar complexity bounds to the nearly-optimal algorithm which is designed for non-composite setting. Besides that, our algorithms allow to separate the complexity bounds, i.e. estimate, for each part of the objective separately, the number of oracle calls that is sufficient to achieve a given accuracy. This is important since different parts can have different arithmetic complexity of the oracle, and it is desired to call expensive oracles less often than cheap oracles. The key thing to all these results is our general framework for saddle-point problems, which may be of independent interest. This framework, in turn is based on our proposed Accelerated Meta-Algorithm for composite optimization with probabilistic inexact oracles and probabilistic inexactness in the proximal mapping, which may be of independent interest as well.

  • P. Dvurechensky, D. Kamzolov, A. Lukashevich, S. Lee, E. Ordentlich, C.A. Uribe, A. Gasnikov, Hyperfast second-order local solvers for efficient statistically preconditioned distributed optimization, Preprint no. arXiv:2102.08246, Cornell University, 2021.

  • P. Dvurechensky, M. Staudigl, S. Shtern, First-order methods for convex optimization, Preprint no. arXiv:2101.00935, Cornell University, 2021.
    First-order methods for solving convex optimization problems have been at the forefront of mathematical optimization in the last 20 years. The rapid development of this important class of algorithms is motivated by the success stories reported in various applications, including most importantly machine learning, signal processing, imaging and control theory. First-order methods have the potential to provide low accuracy solutions at low computational complexity which makes them an attractive set of tools in large-scale optimization problems. In this survey we cover a number of key developments in gradient-based optimization methods. This includes non-Euclidean extensions of the classical proximal gradient method, and its accelerated versions. Additionally we survey recent developments within the class of projection-free methods, and proximal versions of primal-dual schemes. We give complete proofs for various key results, and highlight the unifying aspects of several optimization algorithms.

  • O. Butkovsky, K. Dareiotis , M. Gerencsér, Approximation of SDEs --- A stochastic sewing approach, Preprint no. arXiv:1909.07961, Cornell University, 2020.

  • M. Danilova, P. Dvurechensky, A. Gasnikov, E. Gorbunov, S. Guminov, D. Kamzolov, I. Shibaev, Recent theoretical advances in non-convex optimization, Preprint no. arXiv:2012.06188, Cornell University, 2020.

  • S. Athreya, O. Butkovsky, K. , L. Mytnik, Well-posedness of stochastic heat equation with distributional drift and skew stochastic heat equation, Preprint no. arXiv:2011.13498, Cornell University, 2020.

  • C. Bellingeri, P. Friz, M. Gerencsér, Singular paths spaces and applications, Preprint no. arXiv:2003.03352, Cornell University, 2020.

  • A. Beznosikov, V. Samokhin, A. Gasnikov, Local SGD for saddle-point problems, Preprint no. arXiv:2010.13112, Cornell University, 2020.

  • E. Gorbunov, A. Rogozin, A. Beznosikov, D. Dvinskikh, A. Gasnikov, Recent theoretical advances in decentralized distributed convex optimization, Preprint no. arXiv:2011.13259, Cornell University, 2020.
    In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet, and that could be found in details only in arXiv preprints.

  • N. Puchkin, A. Timofeev, V. Spokoiny, Manifold-based time series forecasting, Preprint no. arXiv:2012.08244, Cornell University, 2020.
    Prediction for high dimensional time series is a challenging task due to the curse of dimensionality problem. Classical parametric models like ARIMA or VAR require strong modeling assumptions and time stationarity and are often overparametrized. This paper offers a new flexible approach using recent ideas of manifold learning. The considered model includes linear models such as the central subspace model and ARIMA as particular cases. The proposed procedure combines manifold denoising techniques with a simple nonparametric prediction by local averaging. The resulting procedure demonstrates a very reasonable performance for real-life econometric time series. We also provide a theoretical justification of the manifold estimation procedure.

  • N. Puchkin, V. Spokoiny, E. Stepanov, D. Trevisan, Reconstruction of manifold embeddings into Euclidean spaces via intrinsic distances, Preprint no. arXiv:2012.13770, Cornell University, 2020.
    We consider one of the classical manifold learning problems, that of reconstructing up to an almost isometry an embedding of a compact connected Riemannian manifold in a Euclidean space given the information on intrinsic distances between points from its almost dense subset. It will be shown that the most popular methods in data science to deal with such a problem, the classical Multidimensional scaling (MDS) and the Maximum variance unfolding (MVU) actually miss the point and may provide results very far from an isometry (and even may give no biLipshitz embedding). We will then provide an easy variational formulation of this problem which leads to an algorithm always providing an almost isometric imbedding with given controlled small distortion of original distances.

  • A. Rastogi, P. Mathé, Inverse learning in Hilbert scales, Preprint no. arXiv:2002.10208, Cornell University, 2020.
    We study the linear ill-posed inverse problem with noisy data in the statistical learning setting. Approximate reconstructions from random noisy data are sought with general regularization schemes in Hilbert scale. We discuss the rates of convergence for the regularized solution under the prior assumptions and a certain link condition. We express the error in terms of certain distance functions. For regression functions with smoothness given in terms of source conditions the error bound can then be explicitly established.

  • A. Sadiev, A. Beznosikov, P. Dvurechensky, A. Gasnikov, Zeroth-order algorithms for smooth saddle-point problems, Preprint no. arXiv:2009.09908, Cornell University, 2020.
    In recent years, the importance of saddle-point problems in machine learning has increased. This is due to the popularity of GANs. In this paper, we solve stochastic smooth (strongly) convex-concave saddle-point problems using zeroth-order oracles. Theoretical analysis shows that in the case when the optimization set is a simplex, we lose only logn times in the stochastic convergence term. The paper also provides an approach to solving saddle-point problems, when the oracle for one of the variables has zero order, and for the second - first order. Subsequently, we implement zeroth-order and 1/2th-order methods to solve practical problems.

  • I. Shibaev, P. Dvurechensky, A. Gasnikov, Zeroth-order methods for noisy Hölder-gradient functions, Preprint no. arXiv:2006.11857, Cornell University, 2020.

  • D. Tiapkin, A. Gasnikov, P. Dvurechensky, Stochastic saddle-point optimization for Wasserstein barycenters, Preprint no. arXiv:2006.06763, Cornell University, 2020.
    We study the computation of non-regularized Wasserstein barycenters of probability measures supported on the finite set. The first result gives a stochastic optimization algorithm for the discrete distribution over the probability measures which is comparable with the current best algorithms. The second result extends the previous one to the arbitrary distribution using kernel methods. Moreover, this new algorithm has a total complexity better than the Stochastic Averaging approach via the Sinkhorn algorithm in many cases.

  • N. Tupitsa, P. Dvurechensky, A. Gasnikov, C.A. Uribe , Multimarginal optimal transport by accelerated alternating minimization, Preprint no. arXiv:2004.02294, Cornell University Library,, 2020.
    We consider a multimarginal optimal transport, which includes as a particular case the Wasserstein barycenter problem. In this problem one has to find an optimal coupling between m probability measures, which amounts to finding a tensor of the order m. We propose an accelerated method based on accelerated alternating minimization and estimate its complexity to find the approximate solution to the problem. We use entropic regularization with sufficiently small regularization parameter and apply accelerated alternating minimization to the dual problem. A novel primal-dual analysis is used to reconstruct the approximately optimal coupling tensor. Our algorithm exhibits a better computational complexity than the state-of-the-art methods for some regimes of the problem parameters.

  • D. Dvinskikh, A. Gasnikov, Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems, Preprint no. arXiv:1904.09015, Cornell University, 2020.
    We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using mini-batching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems.

  • D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, A. Tyurin, V. Spokoiny, Adaptive gradient descent for convex and non-convex stochastic optimization, Preprint no. arXiv:1911.08380, Cornell University, 2020.

  • D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, Preprint no. arXiv:2010.04677, Cornell University, 2020.

  • D. Dvinskikh, Stochastic approximation versus sample average approximation for population Wasserstein barycenter calculation, Preprint no. arXiv:2001.07697, Cornell University, 2020.

  • P. Dvurechensky, K. Safin, S. Shtern, M. Staudigl, Generalized self-concordant analysis of Frank--Wolfe algorithms, Preprint no. arXiv:2010.01009, Cornell University, 2020.
    Projection-free optimization via different variants of the Frank-Wolfe (FW) method has become one of the cornerstones in large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordance like properties. Such generalized self-concordant (GSC) functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex. Indeed, in a number of applications, e.g. inverse covariance estimation or distance-weighted discrimination problems in support vector machines, the loss is given by a GSC function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. This paper closes this apparent gap in the literature by developing provably convergent FW algorithms with standard O(1/k) convergence rate guarantees. If the problem formulation allows the efficient construction of a local linear minimization oracle, we develop a FW method with linear convergence rate.

  • P. Dvurechensky, S. Shtern, M. Staudigl, P. Ostroukhov, K. Safin, Self-concordant analysis of Frank--Wolfe algorithms, Preprint no. arXiv:2002.04320, Cornell University, 2020.
    Projection-free optimization via different variants of the Frank-Wolfe (FW) method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k), k being the iteration counter. If the problem can be represented by a local linear minimization oracle, we are the first to propose a FW method with linear convergence rate without assuming neither strong convexity nor a Lipschitz continuous gradient.

  • P. Friz, J. Gatheral, R. Radoičić, Forests, cumulants, martingales, Preprint no. arXiv:2002.01448, Cornell University, 2020.
    This work is concerned with forest and cumulant type expansions of general random variables on a filtered probability spaces. We establish a "broken exponential martingale" expansion that generalizes and unifies the exponentiation result of Alòs, Gatheral, and Radoičić and the cumulant recursion formula of Lacoin, Rhodes, and Vargas. Specifically, we exhibit the two previous results as lower dimensional projections of the same generalized forest expansion, subsequently related by forest reordering. Our approach also leads to sharp integrability conditions for validity of the cumulant formula, as required by many of our examples, including iterated stochastic integrals, Lévy area, Bessel processes, KPZ with smooth noise, Wiener-Itô chaos and "rough" stochastic (forward) variance models.

  • P. Friz, P. Pigato, J. Seibel, The step stochastic volatility model (SSVM), Preprint no. May 7, Available at SSRN's eLibrary: url or url, 2020.
    Stochastic Volatility Models (SVMs) are ubiquitous in quantitative finance. But is there a Markovian SVM capable of producing extreme (T^(-1/2)) short-dated implied volatility skew? We here propose a modification of a given SVM "backbone", Heston for instance, to achieve just this - without adding jumps or non-Markovian "rough" fractional volatility dynamics. This is achieved via non-smooth leverage function, such as a step function. The resulting Step Stochastic Volatility Model (SSVM) is thus a parametric example of local stochastic volatility model (LSVM). From an IT perspective, SSVM amounts to trivial modifications in the code of existing SVM implementations. From a QF perspective, SSVM offers new flexibility in smile modelling and towards assessing model risk. For comparison, we then exhibit the market-induced leverage function for LSVM, calibrated with the particle method.