Publications

Monographs

  • P. Friz, M. Hairer, A Course on Rough Paths: With an Introduction to Regularity Structures, Universitext, Springer International Publishing, Basel, 2020, 346 pages, (Monograph Published), DOI 10.1007/978-3-030-41556-3 .

Articles in Refereed Journals

  • A. Maltsi, T. Niermann, T. Streckenbach, K. Tabelow, Th. Koprucki, Numerical simulation of TEM images for In(Ga)As/GaAs quantum dots with various shapes, Optical and Quantum Electronics, 52 (2020), pp. 257/1--257/11 (published online on 05.05.2020), DOI 10.1007/s11082-020-02356-y .
    Abstract
    We present a mathematical model and a tool chain for the numerical simulation of TEM images of semiconductor quantum dots (QDs). This includes elasticity theory to obtain the strain profile coupled with the Darwin-Howie-Whelan equations, describing the propagation of the electron wave through the sample. We perform a simulation study on indium gallium arsenide QDs with different shapes and compare the resulting TEM images to experimental ones. This tool chain can be applied to generate a database of simulated TEM images, which is a key element of a novel concept for model-based geometry reconstruction of semiconductor QDs, involving machine learning techniques.

  • O. Butkovsky, A. Kulik, M. Scheutzow, Generalized couplings and ergodic rates for SPDEs and other Markov models, The Annals of Applied Probability, 30 (2020), pp. 1--39, DOI 10.1214/19-AAP1485 .
    Abstract
    We establish verifiable general sufficient conditions for exponential or subexponential ergodicity of Markov processes that may lack the strong Feller property. We apply the obtained results to show exponential ergodicity of a variety of nonlinear stochastic partial differential equations with additive forcing, including 2D stochastic Navier-Stokes equations. Our main tool is a new version of the generalized coupling method.

  • O. Butkovsky, M. Scheutzow, Couplings via comparison principle and exponential ergodicity of SPDEs in the hypoelliptic setting, Communications in Mathematical Physics, 379 (2020), pp. 1001--1034, DOI 10.1007/s00220-020-03834-w .
    Abstract
    We develop a general framework for studying ergodicity of order-preserving Markov semigroups. We establish natural and in a certain sense optimal conditions for existence and uniqueness of the invariant measure and exponential convergence of transition probabilities of an order-preserving Markov process. As an application, we show exponential ergodicity and exponentially fast synchronization-by-noise of the stochastic reaction?diffusion equation in the hypoelliptic setting. This refines and complements corresponding results of Hairer and Mattingly (Electron J Probab 16:658?738, 2011).

  • N. Tapia, L. Zambotti, The geometry of the space of branched rough paths, Proceedings of the London Mathematical Society. Third Series, 121 (2020), pp. 220--251, DOI 10.1112/plms.12311 .
    Abstract
    We construct an explicit transitive free action of a Banach space of Hölder functions on the space of branched rough paths, which yields in particular a bijection between theses two spaces. This endows the space of branched rough paths with the structure of a principal homogeneous space over a Banach space and allows to characterize its automorphisms. The construction is based on the Baker-Campbell-Hausdorff formula, on a constructive version of the Lyons-Victoir extension theorem and on the Hairer-Kelly map, which allows to describe branched rough paths in terms of anisotropic geometric rough paths.

  • A. Ivanova, P. Dvurechensky, A. Gasnikov, Composite optimization for the resource allocation problem, Optimization Methods & Software, published online on 12.02.2020, DOI 10.1080/10556788.2020.1712599 .
    Abstract
    In this paper we consider resource allocation problem stated as a convex minimization problem with linear constraints. To solve this problem, we use gradient and accelerated gradient descent applied to the dual problem and prove the convergence rate both for the primal iterates and the dual iterates. We obtain faster convergence rates than the ones known in the literature. We also provide economic interpretation for these two methods. This means that iterations of the algorithms naturally correspond to the process of price and production adjustment in order to obtain the desired production volume in the economy. Overall, we show how these actions of the economic agents lead the whole system to the equilibrium.

  • M.S. Alkusa, A. Gasnikov, D. Dvinskikh, D.A. Kovalev, F.S. Stonyakin, Accelerated methods for saddle-point problem, Computational Mathematics and Mathematical Physics, 60 (2020), pp. 1787--1809, DOI 10.1134/S0965542520110020 .
    Abstract
    Recently, it has been shown how, on the basis of the usual accelerated gradient method for solving problems of smooth convex optimization, accelerated methods for more complex problems (with a structure) and problems that are solved using various local information about the behavior of a function (stochastic gradient, Hessian, etc.) can be obtained. The term ?accelerated methods? here means, on the one hand, the presence of some unified and fairly general way of acceleration. On the other hand, this also means the optimality of the methods, which can often be proved rigorously. In the present work, an attempt is made to construct in the same way a theory of accelerated methods for solving smooth convex-concave saddle-point problems with a structure. The main result of this article is the obtainment of in some sense necessary and sufficient conditions under which the complexity of solving nonlinear convex-concave saddle-point problems with a structure in the number of calculations of the gradients of composites in direct variables is equal in order of magnitude to the complexity of solving bilinear problems with a structure.

  • A. Anikin, A. Gasnikov, A. Gornov, D. Kamzolov, Y. Maximov, Y. Nesterov, Efficient numerical methods to solve sparse linear equations with application to PageRank, Optimization Methods & Software, (2020), published online on 21.12.2020, DOI 10.1080/10556788.2020.1858297 .
    Abstract
    Over the last two decades, the PageRank problem has received increased interest from the academic community as an efficient tool to estimate web-page importance in information retrieval. Despite numerous developments, the design of efficient optimization algorithms for the PageRank problem is still a challenge. This paper proposes three new algorithms with a linear time complexity for solving the problem over a bounded-degree graph. The idea behind them is to set up the PageRank as a convex minimization problem over a unit simplex, and then solve it using iterative methods with small iteration complexity. Our theoretical results are supported by an extensive empirical justification using real-world and simulated data.

  • S. Athreya, O. Butkovsky, L. Mytnik, Strong existence and uniqueness for stable stochastic differential equations with distributional drift, The Annals of Probability, 48 (2020), pp. 178--210, DOI 10.1214/19-AOP1358 .

  • F. Bachoc, A. Suvorikova , D. Ginsbourger, J.-M. Loubes, V. Spokoiny, Gaussian processes with multidimensional distribution inputs via optimal transport and Hilbertian embedding, Electronic Journal of Statistics, 14 (2020), pp. 2742--2772, DOI 10.1214/20-EJS1725 .
    Abstract
    In this work, we propose a way to construct Gaussian processes indexed by multidimensional distributions. More precisely, we tackle the problem of defining positive definite kernels between multivariate distributions via notions of optimal transport and appealing to Hilbert space embeddings. Besides presenting a characterization of radial positive definite and strictly positive definite kernels on general Hilbert spaces, we investigate the statistical properties of our theoretical and empirical kernels, focusing in particular on consistency as well as the special case of Gaussian distributions. A wide set of applications is presented, both using simulations and implementation with real data.

  • D. Belomestny, M. Kaledin, J.G.M. Schoenmakers, Semitractability of optimal stopping problems via a weighted stochastic mesh algorithm, Mathematical Finance. An International Journal of Mathematics, Statistics and Financial Economics, 30 (2020), pp. 1591--1616, DOI 10.1111/mafi.12271 .
    Abstract
    In this article we propose a Weighted Stochastic Mesh (WSM) algorithm for approximating the value of discrete and continuous time optimal stopping problems. It is shown that in the discrete time case the WSM algorithm leads to semi-tractability of the corresponding optimal stopping problem in the sense that its complexity is bounded in order by $varepsilon^-4log^d+2(1/varepsilon)$ with $d$ being the dimension of the underlying Markov chain. Furthermore we study the WSM approach in the context of continuous time optimal stopping problems and derive the corresponding complexity bounds. Although we can not prove semi-tractability in this case, our bounds turn out to be the tightest ones among the complexity bounds known in the literature. We illustrate our theoretical findings by a numerical example.

  • D. Belomestny, J.G.M. Schoenmakers, V. Spokoiny, B. Zharkynbay, Optimal stopping via reinforced regression, Communications in Mathematical Sciences, 18 (2020), pp. 109--121, DOI 10.4310/CMS.2020.v18.n1.a5 .
    Abstract
    In this note we propose a new approach towards solving numerically optimal stopping problems via boosted regression based Monte Carlo algorithms. The main idea of the method is to boost standard linear regression algorithms in each backward induction step by adding new basis functions based on previously estimated continuation values. The proposed methodology is illustrated by several numerical examples from finance.

  • D. Belomestny, J.G.M. Schoenmakers, Optimal stopping of McKean--Vlasov diffusions via regression on particle systems, SIAM Journal on Control and Optimization, 58 (2020), pp. 529--550, DOI 10.1137/18M1195590 .
    Abstract
    In this note we consider the problem of using regression on interacting particles to compute conditional expectations for McKean-Vlasov SDEs. We prove general result on convergence of linear regression algorithms and establish the corresponding rates of convergence. Application to optimal stopping and variance reduction are considered.

  • I. Chevyrev, P. Friz, A. Korepanov, I. Melbourne, Superdiffusive limits for deterministic fast-slow dynamical systems, Probability Theory and Related Fields, 178 (2020), pp. 735--770, DOI 10.1007/s00440-020-00988-5 .

  • M. Coghi, J.-D. Deuschel, P. Friz, M. Maurelli, Pathwise McKean--Vlasov theory with additive noise, The Annals of Applied Probability, 30 (2020), pp. 2355--2392, DOI 10.1214/20-AAP1560 .
    Abstract
    We take a pathwise approach to classical McKean-Vlasov stochastic differential equations with additive noise, as e.g. exposed in Sznitmann [34]. Our study was prompted by some concrete problems in battery modelling [19], and also by recent progress on rough-pathwise McKean-Vlasov theory, notably Cass--Lyons [9], and then Bailleul, Catellier and Delarue [4]. Such a “pathwise McKean-Vlasov theory” can be traced back to Tanaka [36]. This paper can be seen as an attempt to advertize the ideas, power and simplicity of the pathwise appproach, not so easily extracted from [4, 9, 36]. As novel applications we discuss mean field convergence without a priori independence and exchangeability assumption; common noise and reflecting boundaries. Last not least, we generalize Dawson--Gärtner large deviations to a non-Brownian noise setting.

  • J. Diehl, E.-F. Kurusch, N. Tapia, Time-warping invariants of multidimensional time series, Acta Applicandae Mathematicae. An International Survey Journal on Applying Mathematics and Mathematical Applications, (2020), published online on 14.05.2020, DOI 10.1007/s10440-020-00333-x .
    Abstract
    In data science, one is often confronted with a time series representing measurements of some quantity of interest. Usually, in a first step, features of the time series need to be extracted. These are numerical quantities that aim to succinctly describe the data and to dampen the influence of noise. In some applications, these features are also required to satisfy some invariance properties. In this paper, we concentrate on time-warping invariants.We show that these correspond to a certain family of iterated sums of the increments of the time series, known as quasisymmetric functions in the mathematics literature. We present these invariant features in an algebraic framework, and we develop some of their basic properties.

  • D. Kamzolov, P. Dvurechensky, A. Gasnikov, Universal intermediate gradient method for convex problems with inexact oracle, Optimization Methods & Software, published online on 17.01.2020, DOI 10.1080/10556788.2019.1711079 .

  • L.-Ch. Lin, Y. Chen, G. Pan, V. Spokoiny, Efficient and positive semidefinite pre-averaging realized covariance estimator, Statistica Sinica, 31 (2020), published online on 23.11.2020, DOI 10.5705/ss.202017.0489 .

  • S. Lu, P. Mathé, S. Pereverzyev, Randomized matrix approximation to enhance regularized projection schemes in inverse problems, Inverse Problems. An International Journal on the Theory and Practice of Inverse Problems, Inverse Methods and Computerized Inversion of Data, 36 (2020), pp. 085013/1-- 085013/20, DOI 10.1088/1361-6420/ab9c44 .
    Abstract
    The authors consider a randomized solution to ill-posed operator equations in Hilbert spaces. In contrast to statistical inverse problems, where randomness appears in the noise, here randomness arises in the low-rank matrix approximation of the forward operator, which results in using a Monte Carlo method to solve the inverse problems. In particular, this approach follows the paradigm of the study N. Halko et al 2011 SIAM Rev. 53 217?288, and hence regularization is performed based on the low-rank matrix approximation. Error bounds for the mean error are obtained which take into account solution smoothness and the inherent noise level. Based on the structure of the error decomposition the authors propose a novel algorithm which guarantees (on the mean) a prescribed error tolerance. Numerical simulations confirm the theoretical findings.

  • Y. Nesterov , A. Gasnikov, S. Guminov, P. Dvurechensky, Primal-dual accelerated gradient methods with small-dimensional relaxation oracle, Optimization Methods & Software, published online on 02.03.2020, DOI 10.1080/10556788.2020.1731747 .

  • Y.Y. Park, J. Polzehl, S. Chatterjee, A. Brechmann, M. Fiecas, Semiparametric modeling of time-varying activation and connectivity in task-based fMRI data, Computational Statistics & Data Analysis, 150 (2020), pp. 107006/1--107006/14, DOI 10.1016/j.csda.2020.107006 .
    Abstract
    In functional magnetic resonance imaging (fMRI), there is a rise in evidence that time-varying functional connectivity, or dynamic functional connectivity (dFC), which measures changes in the synchronization of brain activity, provides additional information on brain networks not captured by time-invariant (i.e., static) functional connectivity. While there have been many developments for statistical models of dFC in resting-state fMRI, there remains a gap in the literature on how to simultaneously model both dFC and time-varying activation when the study participants are undergoing experimental tasks designed to probe at a cognitive process of interest. A method is proposed to estimate dFC between two regions of interest (ROIs) in task-based fMRI where the activation effects are also allowed to vary over time. The proposed method, called TVAAC (time-varying activation and connectivity), uses penalized splines to model both time-varying activation effects and time-varying functional connectivity and uses the bootstrap for statistical inference. Simulation studies show that TVAAC can estimate both static and time-varying activation and functional connectivity, while ignoring time-varying activation effects would lead to poor estimation of dFC. An empirical illustration is provided by applying TVAAC to analyze two subjects from an event-related fMRI learning experiment.

  • N. Puchkin, V. Spokoiny, An adaptive multiclass nearest neighbor classifier, ESAIM. Probability and Statistics, 24 (2020), pp. 69--99, DOI 10.1051/ps/2019021 .

  • A. Rastogi, G. Blanchard, P. Mathé, Convergence analysis of Tikhonov regularization for non-linear statistical inverse problems, Electronic Journal of Statistics, 14 (2020), pp. 2798--2841, DOI 10.1214/20-EJS1735 .
    Abstract
    We study a non-linear statistical inverse problem, where we observe the noisy image of a quantity through a non-linear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization) approach to estimate the quantity for the non-linear ill-posed inverse problem. The estimator is defined as the minimizer of a Tikhonov functional, which is the sum of a data misfit term and a quadratic penalty term. We develop a theoretical analysis for the minimizer of the Tikhonov regularization scheme using the concept of reproducing kernel Hilbert spaces. We discuss optimal rates of convergence for the proposed scheme, uniformly over classes of admissible solutions, defined through appropriate source conditions.

  • D. Dvinskikh, S.S. Omelchenko, A. Gasnikov, A.I. Tyurin, Accelerated gradient sliding for minimizing a sum of functions, Doklady Mathematics. Maik Nauka/Interperiodica Publishing, Moscow. English. Translation of the Mathematics Section of: Doklady Akademii Nauk. (Formerly: Russian Academy of Sciences. Doklady. Mathematics)., 101 (2020), pp. 244--246, DOI 10.1134/S1064562420030084 .
    Abstract
    We propose a new way of justifying the accelerated gradient sliding of G. Lan, which allows one to extend the sliding technique to a combination of an accelerated gradient method with an accelerated variance reduction method. New optimal estimates for the solution of the problem of minimizing a sum of smooth strongly convex functions with a smooth regularizer are obtained.

  • D. Dvinskikh, A.I. Tyurin, A. Gasnikov, S.S. Omelchenko, Accelerated and unaccelerated stochastic gradient descent in model generality, Mathematical Notes, 108 (2020), pp. 511--522, DOI 10.1134/S0001434620090230 .
    Abstract
    A new method for deriving estimates of the rate of convergence of optimal methods for solving problems of smooth (strongly) convex stochastic optimization is described. The method is based on the results of stochastic optimization derived from results on the convergence of optimal methods under the conditions of inexact gradients with small noises of nonrandom nature. In contrast to earlier results, all estimates in the present paper are obtained in model generality.

  • CH. Bayer, M. Redmann, J.G.M. Schoenmakers, Dynamic programming for optimal stopping via pseudo-regression, Quantitative Finance, (2020), published online 01.09.2020, DOI 10.1080/14697688.2020.1780299 .
    Abstract
    We introduce new variants of classical regression-based algorithms for optimal stopping problems based on computation of regression coefficients by Monte Carlo approximation of the corresponding L2 inner products instead of the least-squares error functional. Coupled with new proposals for simulation of the underlying samples, we call the approach "pseudo regression". We show that the approach leads to asymptotically smaller errors, as well as less computational cost. The analysis is justified by numerical examples.

  • CH. Bayer, D. Belomestny, M. Redmann, S. Riedel, J.G.M. Schoenmakers, Solving linear parabolic rough partial differential equations, Journal of Mathematical Analysis and Applications, 490 (2020), 124236, DOI 10.1016/j.jmaa.2020.124236 .
    Abstract
    We study linear rough partial differential equations in the setting of [Friz and Hairer, Springer, 2014, Chapter 12]. More precisely, we consider a linear parabolic partial differential equation driven by a deterministic rough path W of Hölder regularity α with ⅓ < α ≤ ½ . Based on a stochastic representation of the solution of the rough partial differential equation, we propose a regression Monte Carlo algorithm for spatio-temporal approximation of the solution. We provide a full convergence analysis of the proposed approximation method which essentially relies on the new bounds for the higher order derivatives of the solution in space. Finally, a comprehensive simulation study showing the applicability of the proposed algorithm is presented.

  • CH. Bayer, Ch.B. Hammouda, R. Tempone, Hierarchical adaptive sparse grids and quasi-Monte Carlo for option pricing under the rough Bergomi model, Quantitative Finance, published online on 20.04.2020, DOI 10.1080/14697688.2020.1744700 .
    Abstract
    The rough Bergomi (rBergomi) model, introduced recently in Bayer et al. [Pricing under rough volatility. Quant. Finance, 2016, 16(6), 887?904], is a promising rough volatility model in quantitative finance. It is a parsimonious model depending on only three parameters, and yet remarkably fits empirical implied volatility surfaces. In the absence of analytical European option pricing methods for the model, and due to the non-Markovian nature of the fractional driver, the prevalent option is to use the Monte Carlo (MC) simulation for pricing. Despite recent advances in the MC method in this context, pricing under the rBergomi model is still a time-consuming task. To overcome this issue, we have designed a novel, hierarchical approach, based on: (i) adaptive sparse grids quadrature (ASGQ), and (ii) quasi-Monte Carlo (QMC). Both techniques are coupled with a Brownian bridge construction and a Richardson extrapolation on the weak error. By uncovering the available regularity, our hierarchical methods demonstrate substantial computational gains with respect to the standard MC method. They reach a sufficiently small relative error tolerance in the price estimates across different parameter constellations, even for very small values of the Hurst parameter. Our work opens a new research direction in this field, i.e. to investigate the performance of methods other than Monte Carlo for pricing and calibrating under the rBergomi model.

  • CH. Bayer, R.F. Tempone , S. Wolfers, Pricing American options by exercise rate optimization, Quantitative Finance, published online on 07.07.2020, DOI 10.1080/14697688.2020.1750678 .
    Abstract
    We present a novel method for the numerical pricing of American options based on Monte Carlo simulation and the optimization of exercise strategies. Previous solutions to this problem either explicitly or implicitly determine so-called optimal exercise regions, which consist of points in time and space at which a given option is exercised. In contrast, our method determines the exercise rates of randomized exercise strategies. We show that the supremum of the corresponding stochastic optimization problem provides the correct option price. By integrating analytically over the random exercise decision, we obtain an objective function that is differentiable with respect to perturbations of the exercise rate even for finitely many sample paths. The global optimum of this function can be approached gradually when starting from a constant exercise rate. Numerical experiments on vanilla put options in the multivariate Black-Scholes model and a preliminary theoretical analysis underline the efficiency of our method, both with respect to the number of time-discretization steps and the required number of degrees of freedom in the parametrization of the exercise rates. Finally, we demonstrate the flexibility of our method through numerical experiments on max call options in the classical Black-Scholes model, and vanilla put options in both the Heston model and the non-Markovian rough Bergomi model.

  • P. Dvurechensky, E. Gorbunov, A. Gasnikov, An accelerated directional derivative method for smooth stochastic convex optimization, European Journal of Operational Research, 290 (2020), published online 20.08.2020, DOI 10.1016/j.ejor.2020.08.027 .
    Abstract
    We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivative of the objective function at this point and in this direction is available with some additive noise. The noise is assumed to be of an unknown nature, but bounded in the absolute value. We underline that we consider directional derivatives in any direction, as opposed to coordinate descent methods which use only derivatives in coordinate directions. For this setting, we propose a non-accelerated and an accelerated directional derivative method and provide their complexity bounds. Despite that our algorithms do not use gradient information, our non-accelerated algorithm has a complexity bound which is, up to a factor logarithmic in problem dimension, similar to the complexity bound of gradient-based algorithms. Our accelerated algorithm has a complexity bound which coincides with the complexity bound of the accelerated gradient-based algorithm up to a factor of square root of the problem dimension, whereas for existing directional derivative methods this factor is of the order of problem dimension. We also extend these results to strongly convex problems. Finally, we consider derivative-free optimization as a particular case of directional derivative optimization with noise in the directional derivative and obtain complexity bounds for non-accelerated and accelerated derivative-free methods. Complexity bounds for these algorithms inherit the gain in the dimension dependent factors from our directional derivative methods.

  • P. Friz, T. Nilssen, W. Stannat , Existence, uniqueness and stability of semi-linear rough partial differential equations, Journal of Differential Equations, 268 (2020), pp. 1686--1721, DOI 10.1016/j.jde.2019.09.033 .

  • P. Mathé, M.T. Nair, B. Hofmann, Regularization of linear ill-posed problems involving multiplication operators, Applicable Analysis. An International Journal, (2020), published online 28.04.2020, DOI 10.1080/00036811.2020.1758308 .
    Abstract
    We study regularization of ill-posed equations involving multiplication operators when the multiplier function is positive almost everywhere and zero is an accumulation point of the range of this function. Such equations naturally arise from equations based on non-compact self-adjoint operators in Hilbert space, after applying unitary transformations arising out of the spectral theorem. For classical regularization theory, when noisy observations are given and the noise is deterministic and bounded, then non-compactness of the ill-posed equations is a minor issue. However, for statistical ill-posed equations with non-compact operators less is known if the data are blurred by white noise. We develop a theory for spectral regularization with emphasis on this case. In this context, we highlight several aspects, in particular, we discuss the intrinsic degree of ill-posedness in terms of rearrangements of the multiplier function. Moreover, we address the required modifications of classical regularization schemes in order to be used for non-compact statistical problems, and we also introduce the concept of the effective ill-posedness of the operator equation under white noise. This study is concluded with prototypical examples for such equations, as these are deconvolution equations and certain final value problems in evolution equations.

  • J. Polzehl, K. Papafitsoros, K. Tabelow, Patch-wise adaptive weights smoothing in R, Journal of Statistical Software, 95 (2020), pp. 1--27, DOI 10.18637/jss.v095.i06 .
    Abstract
    Image reconstruction from noisy data has a long history of methodological development and is based on a variety of ideas. In this paper we introduce a new method called patch-wise adaptive smoothing, that extends the Propagation-Separation approach by using comparisons of local patches of image intensities to define local adaptive weighting schemes for an improved balance of reduced variability and bias in the reconstruction result. We present the implementation of the new method in an R package aws and demonstrate its properties on a number of examples in comparison with other state-of-the art image reconstruction methods.

Contributions to Collected Editions

  • N. Tupitsa, A. Gasnikov, P. Dvurechensky, S. Guminov, Strongly convex optimization for the dual formulation of optimal transport, in: Mathematical optimization theory and operations research, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., 1275 of Theoretical Computer Science and General Issues, Springer International Publishing AG, Cham, 2020, pp. 192--204, DOI 10.1007/978-3-030-58657-7_17 .
    Abstract
    In this paper we experimentally check a hypothesis, that dual problem to discrete entropy regularized optimal transport problem possesses strong convexity on a certain compact set. We present a numerical estimation technique of parameter of strong convexity and show that such an estimate increases the performance of an accelerated alternating minimization algorithm for strongly convex functions applied to the considered problem.

  • D. Dvinskikh, E. Gorbunov, A. Gasnikov, A. Dvurechensky, C.A. Uribe, On primal and dual approaches for distributed stochastic convex optimization over networks, in: 2019 IEEE 58th Conference on Decision and Control (CDC), IEEE Xplore, 2020, pp. 7435--7440, DOI 10.1109/CDC40024.2019 .
    Abstract
    We introduce a primal-dual stochastic gradient oracle method for distributed convex optimization problems over networks. We show that the proposed method is optimal in terms of communication steps. Additionally, we propose a new analysis method for the rate of convergence in terms of duality gap and probability of large deviations. This analysis is based on a new technique that allows to bound the distance between the iteration sequence and the optimal point. By the proper choice of batch size, we can guarantee that this distance equals (up to a constant) to the distance between the starting point and the solution.

  • P. Dvurechensky, A. Gasnikov, E. Nurminski, F. Stonyakin, Advances in low-memory subgradient optimization, in: Numerical Nonsmooth Optimization, A.M. Bagirov, M. Gaudioso, N. Karmitsa, M.M. Mäkelä, S. Taheri, eds., Springer International Publishing, Cham, 2019, pp. 19--59, DOI 10.1007/978-3-030-34910-3_2 .

  • P. Dvurechensky, A. Gasnikov, S. Omelchenko, A. Tiurin, A stable alternative to Sinkhorn's algorithm for regularized optimal transport, in: Mathematical optimization theory and operations research, A. Kononov, M. Khachay, V.A. Kalyagin, P. Pardalos, eds., Theoretical Computer Science and General Issues, Springer International Publishing, Basel, 2020, pp. 406--423, DOI 10.1007/978-3-030-49988-4 .

  • P. Dvurechensky, P. Ostroukhov, K. Safin, S. Shtern, M. Staudigl, Self-concordant analysis of Frank--Wolfe algorithms, in: Proceedings of the 37th International Conference on Machine Learning, H. Daumé Iii, A. Singh, eds., 119 of Proceedings of Machine Learning Research (online), 2020, pp. 2814--2824.
    Abstract
    Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k) after k iterations. If the problem admits a stronger local linear minimization oracle, we construct a novel FW method with linear convergence rate for SC functions.

Preprints, Reports, Technical Reports

  • F. Besold, V. Spokoiny, Adaptive manifold clustering, Preprint no. 2800, WIAS, Berlin, 2020.
    Abstract, PDF (2013 kByte)
    Clustering methods seek to partition data such that elements are more similar to elements in the same cluster than to elements in different clusters. The main challenge in this task is the lack of a unified definition of a cluster, especially for high dimensional data. Different methods and approaches have been proposed to address this problem. This paper continues the study originated by [6] where a novel approach to adaptive nonparametric clustering called Adaptive Weights Clustering (AWC) was offered. The method allows analyzing high-dimensional data with an unknown number of unbalanced clusters of arbitrary shape under very weak modeling as-sumptions. The procedure demonstrates a state-of-the-art performance and is very efficient even for large data dimension D. However, the theoretical study in [6] is very limited and did not re-ally address the question of efficiency. This paper makes a significant step in understanding the remarkable performance of the AWC procedure, particularly in high dimension. The approach is based on combining the ideas of adaptive clustering and manifold learning. The manifold hypoth-esis means that high dimensional data can be well approximated by a d-dimensional manifold for small d helping to overcome the curse of dimensionality problem and to get sharp bounds on the cluster separation which only depend on the intrinsic dimension d. We also address the problem of parameter tuning. Our general theoretical results are illustrated by some numerical experiments.

  • J. Diehl, R. Preiss, M. Ruddy, N. Tapia, The moving frame method for iterated-integrals: Orthogonal invariants, Preprint no. 2796, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2796 .
    Abstract, PDF (286 kByte)
    We explore the algebraic properties of a generalized version of the iterated-sums signature, inspired by previous work of F. Kiraly and H. Oberhauser. In particular, we show how to recover the character property of the associated linear map over the tensor algebra by considering a deformed quasi-shuffle product of words on the latter. We introduce three non-linear transformations on iterated-sums signatures, close in spirit to Machine Learning applications, and show some of their properties.

  • J. Diehl, K. Ebrahimi-Fard, N. Tapia, Generalized iterated-sums signatures, Preprint no. 2795, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2795 .
    Abstract, PDF (152 kByte)
    We explore the algebraic properties of a generalized version of the iterated-sums signature, inspired by previous work of F. Király and H. Oberhauser. In particular, we show how to recover the character property of the associated linear map over the tensor algebra by considering a deformed quasi-shuffle product of words on the latter. We introduce three non-linear transformations on iterated-sums signatures, close in spirit to Machine Learning applications, and show some of their properties.

  • CH. Bayer, D. Belomestny, P. Hager, P. Pigato, J.G.M. Schoenmakers, V. Spokoiny, Reinforced optimal control, Preprint no. 2792, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2792 .
    Abstract, PDF (591 kByte)
    Least squares Monte Carlo methods are a popular numerical approximation method for solving stochastic control problems. Based on dynamic programming, their key feature is the approximation of the conditional expectation of future rewards by linear least squares regression. Hence, the choice of basis functions is crucial for the accuracy of the method. Earlier work by some of us [Belomestny, Schoenmakers, Spokoiny, Zharkynbay, Commun. Math. Sci., 18(1):109?121, 2020] proposes to reinforce the basis functions in the case of optimal stopping problems by already computed value functions for later times, thereby considerably improving the accuracy with limited additional computational cost. We extend the reinforced regression method to a general class of stochastic control problems, while considerably improving the method?s efficiency, as demonstrated by substantial numerical examples as well as theoretical analysis.

  • CH. Bayer, P. Hager, S. Riedel, J.G.M. Schoenmakers, Optimal stopping with signatures, Preprint no. 2790, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2790 .
    Abstract, PDF (407 kByte)
    We propose a new method for solving optimal stopping problems (such as American option pricing in finance) under minimal assumptions on the underlying stochastic process. We consider classic and randomized stopping times represented by linear functionals of the associated rough path signature, and prove that maximizing over the class of signature stopping times, in fact, solves the original optimal stopping problem. Using the algebraic properties of the signature, we can then recast the problem as a (deterministic) optimization problem depending only on the (truncated) expected signature. The only assumption on the process is that it is a continuous (geometric) random rough path. Hence, the theory encompasses processes such as fractional Brownian motion which fail to be either semi-martingales or Markov processes.

  • A. Kroshnin, V. Spokoiny, A. Suvorikova, Statistical inference for Bures--Wasserstein barycenters, Preprint no. 2788, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2788 .
    Abstract, PDF (1745 kByte)
    In this work we introduce the concept of Bures--Wasserstein barycenter $Q_*$, that is essentially a Fréchet mean of some distribution $P$ supported on a subspace of positive semi-definite $d$-dimensional Hermitian operators $H_+(d)$. We allow a barycenter to be constrained to some affine subspace of $H_+(d)$, and we provide conditions ensuring its existence and uniqueness. We also investigate convergence and concentration properties of an empirical counterpart of $Q_*$ in both Frobenius norm and Bures--Wasserstein distance, and explain, how the obtained results are connected to optimal transportation theory and can be applied to statistical inference in quantum mechanics.

  • R. Krawchenko, C.A. Uribe, A. Gasnikov, P. Dvurechensky, Distributed optimization with quantization for computing Wasserstein barycenters, Preprint no. 2782, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2782 .
    Abstract, PDF (946 kByte)
    We study the problem of the decentralized computation of entropy-regularized semi-discrete Wasserstein barycenters over a network. Building upon recent primal-dual approaches, we propose a sampling gradient quantization scheme that allows efficient communication and computation of approximate barycenters where the factor distributions are stored distributedly on arbitrary networks. The communication and algorithmic complexity of the proposed algorithm are shown, with explicit dependency on the size of the support, the number of distributions, and the desired accuracy. Numerical results validate our algorithmic analysis.

  • J. Diehl, K. Ebrahimi-Fard, N. Tapia, Tropical time series, iterated-sum signatures and quasisymmetric functions, Preprint no. 2760, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2760 .
    Abstract, PDF (236 kByte)
    Driven by the need for principled extraction of features from time series, we introduce the iterated-sums signature over any commutative semiring. The case of the tropical semiring is a central, and our motivating, example, as it leads to features of (real-valued) time series that are not easily available using existing signature-type objects.

  • CH. Bayer, F. Harang, P. Pigato, Log-modulated rough stochastic volatility models, Preprint no. 2752, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2752 .
    Abstract, PDF (595 kByte)
    We propose a new class of rough stochastic volatility models obtained by modulating the power-law kernel defining the fractional Brownian motion (fBm) by a logarithmic term, such that the kernel retains square integrability even in the limit case of vanishing Hurst index H. The so-obtained log-modulated fractional Brownian motion (log-fBm) is a continuous Gaussian process even for H = 0. As a consequence, the resulting super-rough stochastic volatility models can be analysed over the whole range of Hurst indices between 0 and 1/2, including H = 0, without the need of further normalization. We obtain the usual power law explosion of the skew as maturity T goes to 0, modulated by a logarithmic term, so no flattening of the skew occurs as H goes to 0.

  • CH. Bayer, J. Qiu, Y. Yao, Pricing options under rough volatility with backward SPDEs, Preprint no. 2745, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2745 .
    Abstract, PDF (553 kByte)
    In this paper, we study the option pricing problems for rough volatility models. As the framework is non-Markovian, the value function for a European option is not deterministic; rather, it is random and satisfies a backward stochastic partial differential equation (BSPDE). The existence and uniqueness of weak solutions is proved for general nonlinear BSPDEs with unbounded random leading coefficients whose connections with certain forward-backward stochastic differential equations are derived as well. These BSPDEs are then used to approximate American option prices. A deep learning-based method is also investigated for the numerical approximations to such BSPDEs and associated non-Markovian pricing problems. Finally, the examples of rough Bergomi type are numerically computed for both European and American options.

  • J. Diehl, K. Ebrahimi-Fard, N. Tapia, Iterated-sums signature, quasi-symmetric functions and time series analysis, Preprint no. 2736, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2736 .
    Abstract, PDF (261 kByte)
    We survey and extend results on a recently defined character on the quasi-shuffle algebra. This character, termed iterated-sums signature, appears in the context of time series analysis and originates from a problem in dynamic time warping. Algebraically, it relates to (multidimensional) quasisymmetric functions as well as (deformed) quasi-shuffle algebras.

  • S. Riedel, Semi-implicit Taylor schemes for stiff rough differential equations, Preprint no. 2734, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2734 .
    Abstract, PDF (538 kByte)
    We study a class of semi-implicit Taylor-type numerical methods that are easy to implement and designed to solve multidimensional stochastic differential equations driven by a general rough noise, e.g. a fractional Brownian motion. In the multiplicative noise case, the equation is understood as a rough differential equation in the sense of T. Lyons. We focus on equations for which the drift coefficient may be unbounded and satisfies a one-sided Lipschitz condition only. We prove well-posedness of the methods, provide a full analysis, and deduce their convergence rate. Numerical experiments show that our schemes are particularly useful in the case of stiff rough stochastic differential equations driven by a fractional Brownian motion.

  • CH. Bayer, P. Friz, N. Tapia, Stability of deep neural networks via discrete rough paths, Preprint no. 2732, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2732 .
    Abstract, PDF (365 kByte)
    Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks. In particular we derive stability bounds in terms of the total p-variation of trained weights for any p ≥ 1.

  • V. Avanesov, Data-driven confidence bands for distributed nonparametric regression, Preprint no. 2729, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2729 .
    Abstract, PDF (316 kByte)
    Gaussian Process Regression and Kernel Ridge Regression are popular nonparametric regression approaches. Unfortunately, they suffer from high computational complexity rendering them inapplicable to the modern massive datasets. To that end a number of approximations have been suggested, some of them allowing for a distributed implementation. One of them is the divide and conquer approach, splitting the data into a number of partitions, obtaining the local estimates and finally averaging them. In this paper we suggest a novel computationally efficient fully data-driven algorithm, quantifying uncertainty of this method, yielding frequentist $L_2$-confidence bands. We rigorously demonstrate validity of the algorithm. Another contribution of the paper is a minimax-optimal high-probability bound for the averaged estimator, complementing and generalizing the known risk bounds.

  • R.J.A. Laeven, J.G.M. Schoenmakers, N.F.F. Schweizer, M. Stadje, Robust multiple stopping -- A path-wise duality approach, Preprint no. 2728, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2728 .
    Abstract, PDF (576 kByte)
    In this paper we develop a solution method for general optimal stopping problems. Our general setting allows for multiple exercise rights, i.e., optimal multiple stopping, for a robust evaluation that accounts for model uncertainty, and for general reward processes driven by multi-dimensional jump-diffusions. Our approach relies on first establishing robust martingale dual representation results for the multiple stopping problem which satisfy appealing path-wise optimality (almost sure) properties. Next, we exploit these theoretical results to develop upper and lower bounds which, as we formally show, not only converge to the true solution asymptotically, but also constitute genuine upper and lower bounds. We illustrate the applicability of our general approach in a few examples and analyze the impact of model uncertainty on optimal multiple stopping strategies.

  • A. Ivanova, A. Gasnikov, P. Dvurechensky, D. Dvinskikh, A. Tyurin, E. Vorontsova, D. Pasechnyuk, Oracle complexity separation in convex optimization, Preprint no. 2711, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2711 .
    Abstract, PDF (424 kByte)
    Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework to combine optimal algorithms for different types of oracles in order to achieve separate optimal oracle complexity for each block, i.e. for each block the corresponding oracle is called the optimal number of times for a given accuracy. As a particular example, we demonstrate that for a combination of a full gradient oracle and either a stochastic gradient oracle or a coordinate descent oracle our approach leads to the optimal number of oracle calls separately for the full gradient part and the stochastic/coordinate descent part.

  • D. Kamzolov, A. Gasnikov, P. Dvurechensky, On the optimal combination of tensor optimization methods, Preprint no. 2710, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2710 .
    Abstract, PDF (284 kByte)
    We consider the minimization problem of a sum of a number of functions having Lipshitz p -th order derivatives with different Lipschitz constants. In this case, to accelerate optimization, we propose a general framework allowing to obtain near-optimal oracle complexity for each function in the sum separately, meaning, in particular, that the oracle for a function with lower Lipschitz constant is called a smaller number of times. As a building block, we extend the current theory of tensor methods and show how to generalize near-optimal tensor methods to work with inexact tensor step. Further, we investigate the situation when the functions in the sum have Lipschitz derivatives of a different order. For this situation, we propose a generic way to separate the oracle complexity between the parts of the sum. Our method is not optimal, which leads to an open problem of the optimal combination of oracles of a different order.

  • F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh, S. Artamonov, V. Piskunova, Inexact relative smoothness and strong convexity for optimization and variational inequalities by inexact model, Preprint no. 2709, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2709 .
    Abstract, PDF (463 kByte)
    In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal conditional gradient method and universal method for variational inequalities with composite structure. These method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. As a particular case of our general framework, we introduce relative smoothness for operators and propose an algorithm for VIs with such operator. We also generalize our framework for relatively strongly convex objectives and strongly monotone variational inequalities.

  • M. Redmann, S. Riedel, Runge--Kutta methods for rough differential equations, Preprint no. 2708, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2708 .
    Abstract, PDF (393 kByte)
    We study Runge-Kutta methods for rough differential equations which can be used to calculate solutions to stochastic differential equations driven by processes that are rougher than a Brownian motion. We use a Taylor series representation (B-series) for both the numerical scheme and the solution of the rough differential equation in order to determine conditions that guarantee the desired order of the local error for the underlying Runge-Kutta method. Subsequently, we prove the order of the global error given the local rate. In addition, we simplify the numerical approximation by introducing a Runge-Kutta scheme that is based on the increments of the driver of the rough differential equation. This simplified method can be easily implemented and is computational cheap since it is derivative-free. We provide a full characterization of this implementable Runge-Kutta method meaning that we provide necessary and sufficient algebraic conditions for an optimal order of convergence in case that the driver, e.g., is a fractional Brownian motion with Hurst index 1/4 < H ≤ 1/2. We conclude this paper by conducting numerical experiments verifying the theoretical rate of convergence.

  • M. Redmann, Ch. Bayer, P. Goya, Low-dimensional approximations of high-dimensional asset price models, Preprint no. 2706, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2706 .
    Abstract, PDF (363 kByte)
    We consider high-dimensional asset price models that are reduced in their dimension in order to reduce the complexity of the problem or the effect of the curse of dimensionality in the context of option pricing. We apply model order reduction (MOR) to obtain a reduced system. MOR has been previously studied for asymptotically stable controlled stochastic systems with zero initial conditions. However, stochastic differential equations modeling price processes are uncontrolled, have non-zero initial states and are often unstable. Therefore, we extend MOR schemes and combine ideas of techniques known for deterministic systems. This leads to a method providing a good pathwise approximation. After explaining the reduction procedure, the error of the approximation is analyzed and the performance of the algorithm is shown conducting several numerical experiments. Within the numerics section, the benefit of the algorithm in the context of option pricing is pointed out.

  • M. Ghani Varzaneh, S. Riedel, A dynamical theory for singular stochastic delay differential equations II: Nonlinear equations and invariant manifolds, Preprint no. 2701, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2701 .
    Abstract, PDF (383 kByte)
    Building on results obtained in [GVRS], we prove Local Stable and Unstable Manifold Theorems for nonlinear, singular stochastic delay differential equations. The main tools are rough paths theory and a semi-invertible Multiplicative Ergodic Theorem for cocycles acting on measurable fields of Banach spaces obtained in [GVR].

  • CH. Bayer, D. Belomestny, P. Hager, P. Pigato, J.G.M. Schoenmakers, Randomized optimal stopping algorithms and their convergence analysis, Preprint no. 2697, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2697 .
    Abstract, PDF (367 kByte)
    In this paper we study randomized optimal stopping problems and consider corresponding forward and backward Monte Carlo based optimization algorithms. In particular we prove the convergence of the proposed algorithms and derive the corresponding convergence rates.

  • C. Bellinger, A. Djurdjevac, P. Friz, N. Tapia, Transport and continuity equations with (very) rough noise, Preprint no. 2696, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2696 .
    Abstract, PDF (320 kByte)
    Existence and uniqueness for rough flows, transport and continuity equations driven by general geometric rough paths are established.

  • S. Guminov, P. Dvurechensky, A. Gasnikov, On accelerated alternating minimization, Preprint no. 2695, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2695 .
    Abstract, PDF (343 kByte)
    Alternating minimization (AM) optimization algorithms have been known for a long time and are of importance in machine learning problems, among which we are mostly motivated by approximating optimal transport distances. AM algorithms assume that the decision variable is divided into several blocks and minimization in each block can be done explicitly or cheaply with high accuracy. The ubiquitous Sinkhorn's algorithm can be seen as an alternating minimization algorithm for the dual to the entropy-regularized optimal transport problem. We introduce an accelerated alternating minimization method with a $1/k^2$ convergence rate, where $k$ is the iteration counter. This improves over known bound $1/k$ for general AM methods and for the Sinkhorn's algorithm. Moreover, our algorithm converges faster than gradient-type methods in practice as it is free of the choice of the step-size and is adaptive to the local smoothness of the problem. We show that the proposed method is primal-dual, meaning that if we apply it to a dual problem, we can reconstruct the solution of the primal problem with the same convergence rate. We apply our method to the entropy regularized optimal transport problem and show experimentally, that it outperforms Sinkhorn's algorithm.

  • P. Dvurechensky, A. Gasnikov, P. Ostroukhov, A.C. Uribe, A. Ivanova, Near-optimal tensor methods for minimizing gradient norm, Preprint no. 2694, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2694 .
    Abstract, PDF (423 kByte)
    Motivated by convex problems with linear constraints and, in particular, by entropy-regularized optimal transport, we consider the problem of finding approximate stationary points, i.e. points with the norm of the objective gradient less than small error, of convex functions with Lipschitz p-th order derivatives. Lower complexity bounds for this problem were recently proposed in [Grapiglia and Nesterov, arXiv:1907.07053]. However, the methods presented in the same paper do not have optimal complexity bounds. We propose two optimal up to logarithmic factors methods with complexity bounds with respect to the initial objective residual and the distance between the starting point and solution respectively

  • P. Dvurechensky, M. Staudigl, C.A. Uribe , Generalized self-concordant Hessian-barrier algorithms, Preprint no. 2693, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2693 .
    Abstract, PDF (647 kByte)
    Many problems in statistical learning, imaging, and computer vision involve the optimization of a non-convex objective function with singularities at the boundary of the feasible set. For such challenging instances, we develop a new interior-point technique building on the Hessian-barrier algorithm recently introduced in Bomze, Mertikopoulos, Schachinger and Staudigl, [SIAM J. Opt. 2019 29(3), pp. 2100-2127], where the Riemannian metric is induced by a generalized selfconcordant function. This class of functions is sufficiently general to include most of the commonly used barrier functions in the literature of interior point methods. We prove global convergence to an approximate stationary point of the method, and in cases where the feasible set admits an easily computable self-concordant barrier, we verify worst-case optimal iteration complexity of the method. Applications in non-convex statistical estimation and Lp-minimization are discussed to given the efficiency of the method.

  • N. Tupitsa, P. Dvurechensky, A. Gasnikov, S. Guminov, Alternating minimization methods for strongly convex optimization, Preprint no. 2692, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2692 .
    Abstract, PDF (387 kByte)
    We consider alternating minimization procedures for convex optimization problems with variable divided in many block, each block being amenable for minimization with respect to its variable with freezed other variables blocks. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under Polyak-Łojasiewicz condition, which can be seen as a relaxation of the strong convexity assumption. Under strong convexity assumption in many-blocks setting we provide an accelerated alternating minimization procedure with linear rate depending on the square root of the condition number as opposed to condition number for the non-accelerated method.

  • E. Gorbunov, D. Dvinskikh, A. Gasnikov, Optimal decentralized distributed algorithms for stochastic convex optimization, Preprint no. 2691, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2691 .
    Abstract, PDF (616 kByte)
    We consider stochastic convex optimization problems with affine constraints and develop several methods using either primal or dual approach to solve it. In the primal case we use special penalization technique to make the initial problem more convenient for using optimization methods. We propose algorithms to solve it based on Similar Triangles Method with Inexact Proximal Step for the convex smooth and strongly convex smooth objective functions and methods based on Gradient Sliding algorithm to solve the same problems in the non-smooth case. We prove the convergence guarantees in smooth convex case with deterministic first-order oracle. We propose and analyze three novel methods to handle stochastic convex optimization problems with affine constraints: SPDSTM, R-RRMA-AC-SA and SSTM_sc. All methods use stochastic dual oracle. SPDSTM is the stochastic primal-dual modification of STM and it is applied for the dual problem when the primal functional is strongly convex and Lipschitz continuous on some ball. R-RRMA-AC-SA is an accelerated stochastic method based on the restarts of RRMA-AC-SA and SSTM_sc is just stochastic STM for strongly convex problems. Both methods are applied to the dual problem when the primal functional is strongly convex, smooth and Lipschitz continuous on some ball and use stochastic dual first-order oracle. We develop convergence analysis for these methods for the unbiased and biased oracles respectively. Finally, we apply all aforementioned results and approaches to solve decentralized distributed optimization problem and discuss optimality of the obtained results in terms of communication rounds and number of oracle calls per node.

  • F. Stonyakin, D. Dvinskikh, P. Dvurechensky, A. Kroshnin, O. Kuznetsova, A. Agafonov, A. Gasnikov, A. Tyurin, C.A. Uribe, D. Pasechnyuk, S. Artamonov, Gradient methods for problems with inexact model of the objective, Preprint no. 2688, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2688 .
    Abstract, PDF (785 kByte)
    We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [19] and relative smoothness condition [43]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. To show potential applications of our general framework we consider three particular problems. The first one is clustering by electorial model introduced in [49]. The second one is approximating optimal transport distance, for which we propose a Proximal Sinkhorn algorithm. The third one is devoted to approximating optimal transport barycenter and we propose a Proximal Iterative Bregman Projections algorithm. We also illustrate the practical performance of our algorithms by numerical experiments.

  • V. Avanesov, Nonparametric change point detection in regression, Preprint no. 2687, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2687 .
    Abstract, PDF (329 kByte)
    This paper considers the prominent problem of change-point detection in regression. The study suggests a novel testing procedure featuring a fully data-driven calibration scheme. The method is essentially a black box, requiring no tuning from the practitioner. The approach is investigated from both theoretical and practical points of view. The theoretical study demonstrates proper control of first-type error rate under H0 and power approaching 1 under H1. The experiments conducted on synthetic data fully support the theoretical claims. In conclusion, the method is applied to financial data, where it detects sensible change-points. Techniques for change-point localization are also suggested and investigated

  • V. Avanesov, How to gamble with non-stationary X-armed bandits and have no regrets, Preprint no. 2686, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2686 .
    Abstract, PDF (287 kByte)
    In X-armed bandit problem an agent sequentially interacts with environment which yields a reward based on the vector input the agent provides. The agent's goal is to maximise the sum of these rewards across some number of time steps. The problem and its variations have been a subject of numerous studies, suggesting sub-linear and sometimes optimal strategies. The given paper introduces a new variation of the problem. We consider an environment, which can abruptly change its behaviour an unknown number of times. To that end we propose a novel strategy and prove it attains sub-linear cumulative regret. Moreover, the obtained regret bound matches the best known bound for GP-UCB for a stationary case, and approaches the minimax lower bound in case of highly smooth relation between an action and the corresponding reward. The theoretical result is supported by experimental study.

  • F. Stonyakin, A. Gasnikov, A. Tyurin, D. Pasechnyuk, A. Agafonov, P. Dvurechensky, D. Dvinskikh, V. Piskunova, Inexact model: A framework for optimization and variational inequalities, Preprint no. 2679, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2679 .
    Abstract, PDF (414 kByte)
    In this paper we propose a general algorithmic framework for first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities. This framework allows to obtain many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, proximal methods. The idea of the framework is based on constructing an inexact model of the main problem component, i.e. objective function in optimization or operator in variational inequalities. Besides reproducing known results, our framework allows to construct new methods, which we illustrate by constructing a universal method for variational inequalities with composite structure. This method works for smooth and non-smooth problems with optimal complexity without a priori knowledge of the problem smoothness. We also generalize our framework for strongly convex objectives and strongly monotone variational inequalities.

  • K. Ebrahimi-Fard, F. Patras, N. Tapia, L. Zambotti, Wick polynomials in non-commutative probability: A group-theoretical approach, Preprint no. 2677, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2677 .
    Abstract, PDF (292 kByte)
    Wick polynomials and Wick products are studied in the context of non-commutative probability theory. It is shown that free, boolean and conditionally free Wick polynomials can be defined and related through the action of the group of characters over a particular Hopf algebra. These results generalize our previous developments of a Hopf algebraic approach to cumulants and Wick products in classical probability theory.

  • P. Dvurechensky, A. Gasnikov, E. Nurminski, F. Stonyakin, Advances in low-memory subgradient optimization, Preprint no. 2676, WIAS, Berlin, 2020, DOI 10.20347/WIAS.PREPRINT.2676 .
    Abstract, PDF (413 kByte)
    One of the main goals in the development of non-smooth optimization is to cope with high dimensional problems by decomposition, duality or Lagrangian relaxation which greatly reduces the number of variables at the cost of worsening differentiability of objective or constraints. Small or medium dimensionality of resulting non-smooth problems allows to use bundle-type algorithms to achieve higher rates of convergence and obtain higher accuracy, which of course came at the cost of additional memory requirements, typically of the order of n2, where n is the number of variables of non-smooth problem. However with the rapid development of more and more sophisticated models in industry, economy, finance, et all such memory requirements are becoming too hard to satisfy. It raised the interest in subgradient-based low-memory algorithms and later developments in this area significantly improved over their early variants still preserving O(n) memory requirements. To review these developments this chapter is devoted to the black-box subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of N.Z. Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds for smooth and non-smooth convex and quasi-convex optimization problems are briefly exposed in what follows to introduce to the relevant fundamentals of non-smooth optimization. Special attention in this section is given to the adaptive step-size policy which aims to attain lowest complexity bounds. Unfortunately the non-differentiability of objective function in convex optimization essentially slows down the theoretical low bounds for the rate of convergence in subgradient optimization compared to the smooth case but there are different modern techniques that allow to solve non-smooth convex optimization problems faster then dictate lower complexity bounds. In this work the particular attention is given to Nesterov smoothing technique, Nesterov Universal approach, and Legendre (saddle point) representation approach. The new results on Universal Mirror Prox algorithms represent the original parts of the survey. To demonstrate application of non-smooth convex optimization algorithms for solution of huge-scale extremal problems we consider convex optimization problems with non-smooth functional constraints and propose two adaptive Mirror Descent methods. The first method is of primal-dual variety and proved to be optimal in terms of lower oracle bounds for the class of Lipschitz-continuous convex objective and constraints. The advantages of application of this method to sparse Truss Topology Design problem are discussed in certain details. The second method can be applied for solution of convex and quasi-convex optimization problems and is optimal in a sense of complexity bounds. The conclusion part of the survey contains the important references that characterize recent developments of non-smooth convex optimization.

Talks, Poster

  • V. Avanesov, Data-driven confidence bands for distributed nonparametric regression (online talk), The 33rd Annual Conference on Learning Theory (COLT 2020) (Online Event), July 9 - 12, 2020, Graz, Austria, July 10, 2020.

  • O. Butkovsky, Exponential ergodicity of order-preserving SPDEs in the hypoelliptic setting via new coupling techniques (online talk), Bernoulli-IMS One World Symposium 2020 (Online Event), August 24 - 28, 2020, August 25, 2020.

  • O. Butkovsky, New coupling techniques for exponential ergodicity of SPDEs in the hypoelliptic and effectively elliptic settings (online talk), Probability Seminar, Heriot-Watt University, Edinburgh, UK, October 21, 2020.

  • O. Butkovsky, Regularization by noise for PDEs - a stochastic sewing approach (online talk), FOR 2402 Seminar (Online Event), Technische Universtität Berlin, Institut für Mathematik, November 12, 2020.

  • O. Butkovsky, Regularization by noise for SDEs and related systems: A tale of two approaches, Eighth Bielefeld-SNU joint Workshop in Mathematics, February 24 - 26, 2020, Universität Bielefeld, Fakultät für Mathematik, February 24, 2020.

  • O. Butkovsky, Skew stochastic heat equation and regularization by noise for PDEs (online talk), Bernoulli-IMS One World Symposium 2020 (Online Event), August 24 - 28, 2020, August 27, 2020.

  • S. Riedel, Optimal stopping: a signature approach (online talk), 13th Annual ERC Berlin-Oxford Young Researchers Meeting on Applied Stochastic Analysis, WIAS Berlin, June 9, 2020.

  • S. Riedel, Runge--Kutta methods for rough differential equations (online talk), The DNA Seminar (spring 2020), Norwegian University of Science and Technology, Department of Mathematical Sciences, Trondheim, Norway, June 24, 2020.

  • N. Tapia, Free Wick polynomials (online talk), Arbeitsgruppenseminar Analysis, Universität Potsdam, Institut für Mathematik, April 24, 2020.

  • N. Tapia, Higher order iterated-sums signatures (online talk), DataSig Seminar, University of Oxford, Mathematical Institute, UK, April 2, 2020.

  • N. Tapia, Transport and continuity equations with rough noise (online talk), DNA Seminar, Norwegian University of Science and Technology, Department of Mathematical Sciences, Trondheim, Norway, April 22, 2020.

  • N. Tapia, Transport equations with low regularity rough noise, Young researchers between geometry and stochastic analysis, February 12 - 14, 2020, University of Bergen, Norway, February 13, 2020.

  • N. Tapia , Signatures in shape analysis (online talk), Geometry of curves in time series and shape analysis (Online Event), August 11 - 14, 2020, Max-Planck-Institut für Mathematik in den Naturwissenschaften (MiS), Leipzig, August 14, 2020.

  • A. Kroshnin, A. Suvorikova, V. Spokoiny, Statistical inference on Bures-Wasserstein space: From theory to practice, Math of Machine Learning 2020, Sotschi, Russian Federation, February 19 - 22, 2020.

  • D. Dvinskikh, A. Gasnikov, SA vs SAA for population Wasserstein barycenter calculation, Math of Machine Learning 2020, Sotschi, Russian Federation, February 19 - 22, 2020.

  • D. Dvinskikh, A. Gasnikov, Two approaches for population Wasserstein barycenter problem: Stochastic averaging versus sample average approximation (online talks), XII Summer School on Operations Research, Data and Decision Making, May 19 - 21, 2020, Faculty of Informatics, Mathematics and Computer Science, Nizhny Novgorod, Russian Federation, May 20, 2020.

  • A. Suvorikova, Change point detection in hight-dimensional data (online talk), Joint Aramco-HSE Reserach Seminar, Higher School of Economics, Faculty of Computer Science, Moscow, Russian Federation, April 15, 2020.

  • A. Suvorikova, Optimal transport and its application in bioinformatics, Second HSE-Yandex autumn school on generative models, November 24 - 27, 2020, National Research University Higher School of Economics, Moskau, Russian Federation, November 27, 2020.

  • A. Suvorikova, Shape-based domain Adaptation, Statistical Seminar of HDI Lab, Higher School of Economics, Faculty of Computer Science, Moscow, Russian Federation, March 2, 2020.

  • A. Suvorikova, Shape-based domain adaptation via optimal transportation (online talk), Machine Learning Online Seminar, Max-Planck-Institut für Mathematik in den Naturwissenschaften (MiS), Leipzig, April 1, 2020.

  • A. Suvorikova, User-friendly optimal transport (online talk), Chromatine Seminar (Online Event), Skolkovo Institute of Science and Technology (Skoltech), Moskau, Russian Federation, August 11, 2020.

  • CH. Bayer, Pricing american options by exercise rate optimization, Research Seminar on Insurance Mathematics and Stochastic Finance, Eidgenössische Technische Hochschule Zürich, Switzerland, January 9, 2020.

  • CH. Bayer, Pricing american options by exercise rate optimization, Mathrisk- INRIA/ LPSM Paris-Diderot Seminar, Inria Paris Research Centre, Mathrisk Research Team, France, February 6, 2020.

  • CH. Bayer, Pricing american options by exercise rate optimization, Lunch at the Lab, University of Calgary, Department of Mathematics and Statistics, Canada, March 3, 2020.

  • CH. Bayer, Rough volatility, Summer School 2020, September 14 - 17, 2020, International Research Training Group (IRTG) 2544, Döllnsee, September 15, 2020.

  • P. Dvurechensky, Distributed optimization for Wasserstein barycenters, Workshop on PDE Constrained Optimization under Uncertainty and Mean Field Games, January 28 - 29, 2020, WIAS Berlin, January 29, 2020.

  • P. Dvurechensky, Distributed optimization for Wasserstein barycenters (online talk), INFORMS Telecommunication and Network Analytics Conference 2020 (Online Event), October 20 - 21, 2020, INFORMS Telecommunications & Networks Analytics, USA, October 21, 2020.

  • P. Friz, Diamonds, Cumulants, Signature cumulants and Magnus expansion (online talk), Geometry of curves in time series and shape analysis (Online Event), August 11 - 14, 2020, Max-Planck-Institut für Mathematik in den Naturwissenschaften (MiS), Leipzig, August 14, 2020.

  • TH. Koprucki, K. Tabelow, T. Streckenbach, T. Niermann, A. Maltsi, Model-based geometry reconstruction of TEM images (online poster session), MATH+ Day 2020 (Online Event), November 6, 2020.

  • P. Mathé, Bayesian inverse problems with non-commuting operators, University of Edinburgh, School of Mathematics, UK, February 14, 2020.

  • V. Spokoiny, Advanced Statistical Methods, February 11 - March 3, 2020, Higher School of Economics, Faculty of Computer Science, Moskau, Russian Federation.

  • V. Spokoiny, Bayes inference for non-linear inverse problems, Statistics meets Machine Learning, January 26 - 31, 2020, Mathematisches Forschungsinstitut Oberwolfach (MFO), January 28, 2020.

  • K. Tabelow, MaRDI: Mathematical research data initiative, Leibniz NFDI-Konferenz (Online Event), August 19, 2020.

External Preprints

  • O. Butkovsky, K. Dareiotis , M. Gerencsér, Approximation of SDEs--a stochastic sewing approach, Preprint no. arXiv:1909.07961, Cornell University, 2020.

  • M. Danilova, P. Dvurechensky, A. Gasnikov, E. Gorbunov, S. Guminov, D. Kamzolov, I. Shibaev, Recent theoretical advances in non-convex optimization, Preprint no. arXiv:2012.06188, Cornell University, 2020.

  • S. Athreya, O. Butkovsky, K. , L. Mytnik, Well-posedness of stochastic heat equation with distributional drift and skew stochastic heat equation, Preprint no. arXiv:2011.13498, Cornell University, 2020.

  • C. Bellingeri, P. Friz, M. Gerencsér, Singular paths spaces and applications, Preprint no. arXiv:2003.03352, Cornell University, 2020.

  • A. Beznosikov, V. Samokhin, A. Gasnikov, Local SGD for saddle-point problems, Preprint no. arXiv:2010.13112, Cornell University, 2020.

  • E. Gorbunov, A. Rogozin, A. Beznosikov, D. Dvinskikh, A. Gasnikow, Recent theoretical advances in decentralized distributed convex optimization, Preprint no. arXiv:2011.13259, Cornell University, 2020.
    Abstract
    In the last few years, the theory of decentralized distributed convex optimization has made significant progress. The lower bounds on communications rounds and oracle calls have appeared, as well as methods that reach both of these bounds. In this paper, we focus on how these results can be explained based on optimal algorithms for the non-distributed setup. In particular, we provide our recent results that have not been published yet, and that could be found in details only in arXiv preprints.

  • N. Puchkin, A. Timofeev, V. Spokoiny, Manifold-based time series forecasting, Preprint no. arXiv:2012.08244, Cornell University, 2020.
    Abstract
    Prediction for high dimensional time series is a challenging task due to the curse of dimensionality problem. Classical parametric models like ARIMA or VAR require strong modeling assumptions and time stationarity and are often overparametrized. This paper offers a new flexible approach using recent ideas of manifold learning. The considered model includes linear models such as the central subspace model and ARIMA as particular cases. The proposed procedure combines manifold denoising techniques with a simple nonparametric prediction by local averaging. The resulting procedure demonstrates a very reasonable performance for real-life econometric time series. We also provide a theoretical justification of the manifold estimation procedure.

  • N. Puchkin, V. Spokoiny, E. Stepanov, D. Trevisan, Reconstruction of manifold embeddings into Euclidean spaces via intrinsic distances, Preprint no. arXiv:2012.13770, Cornell University, 2020.
    Abstract
    We consider one of the classical manifold learning problems, that of reconstructing up to an almost isometry an embedding of a compact connected Riemannian manifold in a Euclidean space given the information on intrinsic distances between points from its almost dense subset. It will be shown that the most popular methods in data science to deal with such a problem, the classical Multidimensional scaling (MDS) and the Maximum variance unfolding (MVU) actually miss the point and may provide results very far from an isometry (and even may give no biLipshitz embedding). We will then provide an easy variational formulation of this problem which leads to an algorithm always providing an almost isometric imbedding with given controlled small distortion of original distances.

  • A. Rastogi, P. Mathé, Inverse learning in Hilbert scales, Preprint no. arXiv:2002.10208, Cornell University, 2020.
    Abstract
    We study the linear ill-posed inverse problem with noisy data in the statistical learning setting. Approximate reconstructions from random noisy data are sought with general regularization schemes in Hilbert scale. We discuss the rates of convergence for the regularized solution under the prior assumptions and a certain link condition. We express the error in terms of certain distance functions. For regression functions with smoothness given in terms of source conditions the error bound can then be explicitly established.

  • A. Sadiev, A. Beznosikov, P. Dvurechensky, A. Gasnikov, Zeroth-order algorithms for smooth saddle-point problems, Preprint no. arXiv:2009.09908, Cornell University, 2020.
    Abstract
    In recent years, the importance of saddle-point problems in machine learning has increased. This is due to the popularity of GANs. In this paper, we solve stochastic smooth (strongly) convex-concave saddle-point problems using zeroth-order oracles. Theoretical analysis shows that in the case when the optimization set is a simplex, we lose only logn times in the stochastic convergence term. The paper also provides an approach to solving saddle-point problems, when the oracle for one of the variables has zero order, and for the second - first order. Subsequently, we implement zeroth-order and 1/2th-order methods to solve practical problems.

  • I. Shibaev, P. Dvurechensky, A. Gasnikov, Zeroth-order methods for noisy Hölder-gradient functions, Preprint no. arXiv:2006.11857, Cornell University, 2020.

  • D. Tiapkin, A. Gasnikov, P. Dvurechensky, Stochastic saddle-point optimization for Wasserstein Barycenters, Preprint no. arXiv:2006.06763, Cornell University, 2020.
    Abstract
    We study the computation of non-regularized Wasserstein barycenters of probability measures supported on the finite set. The first result gives a stochastic optimization algorithm for the discrete distribution over the probability measures which is comparable with the current best algorithms. The second result extends the previous one to the arbitrary distribution using kernel methods. Moreover, this new algorithm has a total complexity better than the Stochastic Averaging approach via the Sinkhorn algorithm in many cases.

  • N. Tupitsa, P. Dvurechensky, A. Gasnikov, C.A. Uribe , Multimarginal optimal transport by accelerated alternating minimization, Preprint no. arXiv:2004.02294, Cornell University Library, arXiv.org, 2020.
    Abstract
    We consider a multimarginal optimal transport, which includes as a particular case the Wasserstein barycenter problem. In this problem one has to find an optimal coupling between m probability measures, which amounts to finding a tensor of the order m. We propose an accelerated method based on accelerated alternating minimization and estimate its complexity to find the approximate solution to the problem. We use entropic regularization with sufficiently small regularization parameter and apply accelerated alternating minimization to the dual problem. A novel primal-dual analysis is used to reconstruct the approximately optimal coupling tensor. Our algorithm exhibits a better computational complexity than the state-of-the-art methods for some regimes of the problem parameters.

  • D. Dvinskikh, A. Gasnikov, Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems, Preprint no. arXiv:1904.09015, Cornell University, 2020.
    Abstract
    We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only up to a logarithmic factor and the notion of smoothness. By using mini-batching technique, we show that the proposed methods with stochastic oracle can be additionally parallelized at each node. The considered algorithms can be applied to many data science problems and inverse problems.

  • D. Dvinskikh, A. Ogaltsov, A. Gasnikov, P. Dvurechensky, A. Tyurin, V. Spokoiny, Adaptive gradient descent for convex and non-convex stochastic optimization, Preprint no. arXiv:1911.08380, Cornell University, 2020.

  • D. Dvinskikh, D. Tiapkin, Improved complexity bounds in Wasserstein barycenter problem, Preprint no. arXiv:2010.04677, Cornell University, 2020.

  • D. Dvinskikh, Stochastic approximation versus sample average approximation for population Wasserstein barycenter calculation, Preprint no. arXiv:2001.07697, Cornell University, 2020.

  • P. Dvurechensky, K. Safin, S. Shtern, M. Staudigl, Generalized self-concordant analysis of Frank-Wolfe algorithms, Preprint no. arXiv:2010.01009, Cornell University, 2020.
    Abstract
    Projection-free optimization via different variants of the Frank-Wolfe (FW) method has become one of the cornerstones in large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordance like properties. Such generalized self-concordant (GSC) functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex. Indeed, in a number of applications, e.g. inverse covariance estimation or distance-weighted discrimination problems in support vector machines, the loss is given by a GSC function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. This paper closes this apparent gap in the literature by developing provably convergent FW algorithms with standard O(1/k) convergence rate guarantees. If the problem formulation allows the efficient construction of a local linear minimization oracle, we develop a FW method with linear convergence rate.

  • P. Dvurechensky, S. Shtern, M. Staudigl, P. Ostroukhov, K. Safin, Self-concordant analysis of Frank-Wolfe algorithms, Preprint no. arXiv:2002.04320, Cornell University, 2020.
    Abstract
    Projection-free optimization via different variants of the Frank-Wolfe (FW) method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems or quantum state tomography, the loss is given by a self-concordant (SC) function having unbounded curvature, implying absence of theoretical guarantees for the existing FW methods. We use the theory of SC functions to provide a new adaptive step size for FW methods and prove global convergence rate O(1/k), k being the iteration counter. If the problem can be represented by a local linear minimization oracle, we are the first to propose a FW method with linear convergence rate without assuming neither strong convexity nor a Lipschitz continuous gradient.

  • P. Friz, J. Gatheral, R. Radoičić, Forests, cumulants, martingales, Preprint no. arXiv:2002.01448, Cornell University, 2020.
    Abstract
    This work is concerned with forest and cumulant type expansions of general random variables on a filtered probability spaces. We establish a "broken exponential martingale" expansion that generalizes and unifies the exponentiation result of Alòs, Gatheral, and Radoičić and the cumulant recursion formula of Lacoin, Rhodes, and Vargas. Specifically, we exhibit the two previous results as lower dimensional projections of the same generalized forest expansion, subsequently related by forest reordering. Our approach also leads to sharp integrability conditions for validity of the cumulant formula, as required by many of our examples, including iterated stochastic integrals, Lévy area, Bessel processes, KPZ with smooth noise, Wiener-Itô chaos and "rough" stochastic (forward) variance models.

  • P. Friz, P. Pigato, J. Seibel, The step stochastic volatility model (SSVM), Preprint no. May 7, Available at SSRN's eLibrary: urlhttps://ssrn.com/abstract=3595408 or urlhttp://dx.doi.org/10.2139/ssrn.3595408, 2020.
    Abstract
    Stochastic Volatility Models (SVMs) are ubiquitous in quantitative finance. But is there a Markovian SVM capable of producing extreme (T^(-1/2)) short-dated implied volatility skew? We here propose a modification of a given SVM "backbone", Heston for instance, to achieve just this - without adding jumps or non-Markovian "rough" fractional volatility dynamics. This is achieved via non-smooth leverage function, such as a step function. The resulting Step Stochastic Volatility Model (SSVM) is thus a parametric example of local stochastic volatility model (LSVM). From an IT perspective, SSVM amounts to trivial modifications in the code of existing SVM implementations. From a QF perspective, SSVM offers new flexibility in smile modelling and towards assessing model risk. For comparison, we then exhibit the market-induced leverage function for LSVM, calibrated with the particle method.