true
13.6 s
4.7 ms

Nonlinear systems of equations

5.0 μs

Automatic differentiation

4.4 μs

Dual numbers

4.5 μs

We all know the field of complex numbers C: they extend the real numbers R based on the introduction ot i with i2=1.

5.5 μs

Dual numbers are defined by extending the real numbers by formally adding an number ε with ε2=0:

D={a+bε|a,bR}={(ab0a)|a,bR}R2×2

They form a ring, not a field.

5.8 μs
  • Evaluating polynomials on dual numbers: Let p(x)=i=0npixi. Then

p(a+bε)=i=0npiai+i=1nipiai1bε=p(a)+bp(a)ε

  • This can be generalized to any analytical function. automatic evaluation of function and derivative at once

  • forward mode automatic differentiation

  • Multivariate dual numbers: generalization for partial derivatives

2.1 ms

Dual numbers in Julia

3.6 μs

Constructing a dual number:

2.5 μs
d
2 + 1ɛ
3.2 ms

Accessing its components:

2.2 μs
221 ns

Comparison with known derivative:

2.2 μs
testdual (generic function with 1 method)
23.8 μs

Polynomial expressions:

2.2 μs
p (generic function with 1 method)
21.1 μs
dp (generic function with 1 method)
17.2 μs
13.8 ms

Standard functions:

2.4 μs
82.6 ms
6.6 ms

Function composition:

2.2 μs
8.2 ms

Conclusion: if we apply dual numbers in the right way, we can do calculations with derivatives of complicated nonlinear expressions without the need to write code to calculate derivatives.

3.8 μs

The forwardiff package provides these facilities.

4.3 μs
testdual1 (generic function with 1 method)
17.9 μs
63.5 ms

Let us plot some complicated function:

2.3 μs
g (generic function with 1 method)
16.9 μs
X
-5.0:0.01:5.0
6.4 μs
724 ms

Solving nonlinear systems of equations

2.3 μs

Let A1An be functions depending on n unknowns u1un. Solve the system of nonlinear equations:

A(u)=(A1(u1un)A2(u1un)An(u1un))=(f1f2fn)=f

A(u) can be seen as a nonlinar operator A:DRn where DRn is its domain of definition.

There is no analogon to Gaussian elimination, so we need to solve iteratively.

3.8 μs

Fixpoint iteration scheme:

Assume A(u)=M(u)u where for each u, M(u):RnRn is a linear operator.

Then we can define the iteration scheme: choose an initial value u0 and at each iteration step, solve

M(ui)ui+1=f

Terminate if

||A(ui)f||<ε(residual based)

or

||ui+1ui||<ε(update based).

  • Large domain of convergence

  • Convergence may be slow

  • Smooth coefficients not necessary

8.4 μs
fixpoint! (generic function with 1 method)
30.9 μs
Example problem
2.4 μs
M (generic function with 1 method)
29.5 μs
F
1.4 μs
1.2 s
contraction (generic function with 1 method)
20.1 μs
plothistory (generic function with 1 method)
16.8 μs
1.4 μs
23.5 ms
3.7 μs

Newton iteration scheme

The fixed point iteration scheme assumes a particular structure of the nonlinear system. Can we do better ?

Let A(u) be the Jacobi matrix of first partial derivatives of A at point u:

A(u)=(akl)

'with

akl=ulAk(u1un)

The one calculates in the i-th iteration step:

ui+1=ui(A(ui))1(A(ui)f)

One can split this a folows:

  • Calculate residual: ri=A(ui)f

  • Solve linear system for update: A(ui)hi=ri

  • Update solution: ui+1=uihi

General properties are:

  • Potenially small domain of convergence - one needs a good initial value

  • Possibly slow initial convergence

  • Quadratic convergence close to the solution

12.6 μs

Linear and quadratic convergence

Let ei=uiu^.

  • Linear convergence: observed for e.g. linear systems: Asymptically constant error contraction rate

||ei+1||||ei||ρ<1

  • Quadratic convergence: i0>0 such that i>i0, ||ei+1||||ei||2M<1.

    • As ||ei|| decreases, the contraction rate decreases:

||ei+1||||ei||||ei||||ei1||=||ei+1||||ei||2||ei1||||ei1||M

  • In practice, we can watch ||ri|| or ||hi||

8.1 μs

Automatic differentiation for Newton's method

2.2 μs

This is the situation where we could apply automatic differentiation for vector functions of vectors.

2.3 μs
A (generic function with 1 method)
14.7 μs

Create a result buffer for n=2

2.4 μs
dresult
MutableDiffResult([5.0e-324, 5.0e-324], ([6.91366206214077e-310 6.91366204885713e-310; 6.91366206214235e-310 6.91366042732735e-310],))
5.9 ms

Calculate function and derivative at once:

2.4 μs
MutableDiffResult([0.1999999999999993, 0.1999999999999993], ([8.100000000000001 -8.0; -8.0 8.100000000000001],))
949 ms
1.1 ms
2×2 Array{Float64,2}:
  8.1  -8.0
 -8.0   8.1
1.2 ms

A Newton solver with automatic differentiation

2.3 μs
newton (generic function with 1 method)
113 μs
635 ms
999 ns
22.2 ms
4.0 μs

Let us take a more complicated example:

2.2 μs
A2 (generic function with 1 method)
26.1 μs
F2
782 ns
U02
11.8 ms
445 ms
3.1 μs
1.5 μs
24.8 ms

Here, we observe that we have to use lots of iteration steps and see a rather erratic behaviour of the residual. After 55 steps we arrive in the quadratic convergence region where convergence is fast.

2.4 μs

Damped Newton iteration

There are may ways to improve the convergence behaviour and/or to increase the convergence radius in such a case. The simplest ones are:

  • find a good estimate of the initial value

  • damping: do not use the full update, but damp it by some factor which we increase during the iteration process

  • linesearch: automatic detection of a damping factor

5.5 μs
dnewton (generic function with 1 method)
78.3 μs
433 ms
1.5 μs
23.5 ms
2.9 μs

The example shows: damping indeed helps to improve the convergece behaviour. However, if we keep the damping parameter less than 1, we loose the quadratic convergence behavior.

2.2 μs

Parameter embedding

2.3 μs

Another option is the use of parameter embedding for parameter dependent problems.

  • Problem: solve A(uλ,λ)=f for λ=1.

  • Assume A(u0,0) can be easily solved.

  • Choose step size δ

  1. Solve A(u0,0)=f

  2. Set λ=0

  3. Solve A(uλ+δ,λ+δ)=f with initial value uλ

  4. Set λ=λ+δ

  5. If λ<1 repeat with 3.

  • If δ is small enough, we can ensure that uλ is a good initial value for uλ+δ.

  • Possibility to adapt δ depending on Newton convergence

  • Parameter embedding + damping + update based convergence control go a long way to solve even strongly nonlinear problems!

  • As we will see later, a similar apporach can be used for time dependent problems.

11.2 μs