Theory of optimization algorithms for large-scale problems motivated by machine learning applications

Humboldt University, Berlin, winter semester 2020/2021, Online

The course will be concentrated mostly on theoretical aspects of first-order methods in optimization, e.g. gradient descent, accelerated gradient descent, mirror descent, AdaGrad. These methods are at their renaissance motivated by current big data world, which leads to large-scale problems with no need of very precise solution. One of the main assumptions in the course will be convexity of the objective function and some background from convex analysis will be given. Another cornerstone is randomness of different nature. One case is when the randomness is present in the problem itself, e.g. in stochastic optimization, for which stochastic gradient descent is usually applied. The second case is when the randomness is added to a deterministic problem by introducing randomization. In this case one can also apply stochastic gradient descent, as well as more sophisticated random coordinate descent or stochastic variance reduction methods. Non-convex optimization will also be discussed as many of the optimization problems in machine learning are non-convex.

Prerequisites: Multivariate calculus, Linear algebra, Basic probability

Preliminary program

  • Motivating examples from Machine Learning: regression, classification, neural networks.
  • Oracle model, complexity, complexity of global optimization. Classes of objectives.
  • Background from optimization theory: optimality conditions, Lagrange duality, Karush-Kuhn-Tucker conditions, Legendre-Fenchel conjugate duality.
  • Non-smooth convex optimization: lower complexity bound, (composite) mirror descent, switching mirror descent.
  • Smooth convex optimization: lower complexity bounds, mirror descent, accelerated gradient method, smoothing technique..
  • Projection-Free Convex Optimization: Frank-Wolfe method.
  • Mirror-prox method for variational inequalities.
  • Stochastic convex optimization: lower bounds, stochastic mirror descent, stochastic accelerated gradient method.
  • Adaptive methods in optimization: line search, AdaGrad, Adam.
  • Optimization in optimal transport: Optimal transport distances and barycenters, primal-dual accelerated gradient methods.


  1. A. Ben-Tal, A. Nemirovski
    Lectures on Modern Convex Optimization
  2. G. Lan
    Lectures on Optimization Methods for Machine Learning
  3. Yu. Nesterov
    Introductory Lectures on Convex Optimization. A Basic Course
    Springer-Verlag US, 2004
  4. Yu. Nesterov
    Lectures on convex optimization
    Springer, 2018
  5. John C. Duchi
    Introductory Lectures on Stochastic Optimization,