Machine Learning: Mathematical foundations to applications

Mathematical research topic of WIAS

DE |
EN

Scientific machine learning

Scientific machine learning is a research area that encompasses methods combining machine learning with physical laws, often formulated in terms of partial differential equations (PDEs) or variational inequalities (VIs). PDEs form the mathematical backbone of modeling physical phenomena. They describe how quantities such as heat or pressure evolve in space and time. Variational inequalities generalize PDEs to model constrained and nonsmooth phenomena, such as contact, phase transitions, or obstacle problems.

In many applications, these models depend on parameters representing uncertainties, for example material properties. Accurately solving such parametric models is crucial for prediction and decision-making, but becomes computationally demanding in high-dimensional settings or for complex models.

Neural networks (NNs) are a central class of highly expressive function approximators in modern machine learning. They offer new ways to approximate solutions of PDEs and VIs. Research in this area ranges from using NNs to learn parameter-to-solution maps of parametric PDEs, potentially depending on high-dimensional stochastic parameters, to approaches related to physics-informed neural networks (PINNs), which solve models by incorporating the PDE or VI directly into the learning process. In both settings, multiscale modeling strategies can be employed to reduce computational effort across different resolution levels. These methods enable fast surrogate models that can replace expensive simulations.

Together, these approaches address PDE and VI constraints, high-dimensional parameter dependence, multilevel structures, and nonsmooth features, where traditional solvers struggle. Beyond practical implementations, the mathematical analysis of these methods is of central interest, including convergence with respect to the expressivity of NNs (approximation errors), the availability of data and generalization properties (statistical or estimation errors), and the optimization procedures used for training (training errors).

Computational Biomedical Models

Real-world applications of scientific machine learning require strong robustness guarantees and high computational efficiency. In biomedical modelling in particular, high-performance numerical solvers are combined with state-of-the-art machine learning methods and architectures to address a broad range of interdisciplinary tasks. Some applications rely on NNs as functional approximators in data assimilation, shape optimization, and reduced-order modelling, typically replacing high-dimensional finite-element spaces with nonlinear reduced representations. Other applications correspond to more standard machine-learning settings, such as data augmentation and generative modelling, regression for operator learning or surrogate modelling, and nonlinear dimensionality reduction or solution-manifold learning. A crucial challenge is the pronounced inter-patient geometric variability of anatomical organs, which must be handled by tailored numerical and machine learning techniques: from multigrid ResNet-based Large Deformation Diffeomorphic Metric Mapping (LDDMM) for efficient shape registration, to shape-informed graph NNs and conditioned LDDMM flow matching used to transport nested hexahedral meshes required by high-performance matrix-free solvers.

Data-driven methods for quantitative imaging

In quantitative imaging, the information associated with each pixel or voxel represents meaningful physical or biological properties of the imaged object. This is especially important in medical imaging applications such as quantitative magnetic resonance imaging (qMRI), where spatially resolved maps of biophysical parameters enable a more accurate characterization of tissue and can support improved diagnosis. The reconstruction of quantitative parameter maps typically requires solving coupled inverse problems. These involve an often ill-posed measurement process, described by a forward operator, and a physical model, often a differential equation, that links the desired quantitative parameters to the observed image data. Developing stable, accurate, and computationally efficient reconstruction methods for these problems is a central goal. Typical challenges include noise, undersampling, model mismatch, and the need for fast algorithms suitable for high-dimensional data. To mitigate model mismatch, parts of the underlying physical model can be learned from data, leading to learning informed forward models that naturally give rise to large-scale and often nonsmooth optimization problems.

Another key aspect is the development and analysis of data-driven and learning-based reconstruction techniques, which often improve reconstruction quality and robustness compared to purely handcrafted classical methodologies. In medical imaging, unsupervised methods that do not rely on large amounts of training data, such as blind dictionary learning, are especially valuable when annotated or high-quality reference data are scarce. Complementary approaches aim to reduce the black-box nature of NNs, including plug-and-play methods and algorithm unrolling. These techniques embed learned components into provably convergent optimization algorithms, replacing only selected sub-steps with typically small NNs in order to maintain stability, interpretability, and theoretical guarantees.

High-dimensional sampling

Sampling plays an important role in uncertainty quantification and inverse problems. High-dimensional probability distributions are often difficult to explore with classical sampling methods, in particular when only implicit or unnormalized density information is available, as is often the case in Bayesian inverse problems.

To address this, machine learning methods can be used to approximate transport maps that push forward samples from a reference distribution to a target distribution, for example along trajectories of ordinary or stochastic differential equations. Various strategies can be employed, such as learning low-rank formats in a physics-informed manner to approximate these transports, or incorporating NNs into stochastic differential equation models.

Learning-informed optimization

Learning-informed optimization refers to optimization or control problems in which parts of the model, constraints, or operators are not known in closed form, or are only partially known, and are instead inferred from data using NNs. These learned components are incorporated into optimization, allowing the learned model to serve as part of the constraints. In addition, the optimization objective itself may originate from a learning problem, as in PINNs, and can be considered within a broader optimization perspective, as is the case in the design of hybrid solvers incorporating PINNs. The inclusion of learning tools into optimization and control has significant implications for the functional-analytic and numerical treatment of the problem, which are crucial for the design of tailored optimization algorithms.

Publications

Preprints, Reports, Technical Reports

M. Eigel, Ch. Miranda, A. Nouy, D. Sommer, Approximation and learning with compositional tensor trains, Preprint no. 3253, WIAS, Berlin, 2025, DOI 10.20347/WIAS.PREPRINT.3253 .
Abstract, PDF (1203 kByte)
We introduce compositional tensor trains (CTTs) for the approximation of multivariate functions, a class of models obtained by composing low-rank functions in the tensor-train format. This format can encode standard approximation tools, such as (sparse) polynomials, deep neural networks (DNNs) with fixed width, or tensor networks with arbitrary permutation of the inputs, or more general affine coordinate transformations, with similar complexities. This format can be viewed as a DNN with width exponential in the input dimension and structured weights matrices. Compared to DNNs, this format enables controlled compression at the layer level using efficient tensor algebra. par On the optimization side, we derive a layerwise algorithm inspired by natural gradient descent, allowing to exploit efficient low-rank tensor algebra. This relies on low-rank estimations of Gram matrices, and tensor structured random sketching. Viewing the format as a discrete dynamical system, we also derive an optimization algorithm inspired by numerical methods in optimal control. Numerical experiments on regression tasks demonstrate the expressivity of the new format and the relevance of the proposed optimization algorithms. par Overall, CTTs combine the expressivity of compositional models with the algorithmic efficiency of tensor algebra, offering a scalable alternative to standard deep neural networks.