true

xxxxxxxxxx
 
begin 
    ENV["LANG"]="C"
    using Pkg
    Pkg.activate(mktempdir())
    Pkg.add(["PyPlot","PlutoUI","DualNumbers","ForwardDiff","DiffResults"])
    using PlutoUI
    using PyPlot
    using DualNumbers
    using LinearAlgebra
    using ForwardDiff
    using DiffResults
    PyPlot.svg(true)
end

13.6 s

Contents

Nonlinear systems of equations

Automatic differentiation

Dual numbers

Dual numbers in Julia

Solving nonlinear systems of equations

Fixpoint iteration scheme:

Example problem

Newton iteration scheme

Linear and quadratic convergence

Automatic differentiation for Newton's method

Damped Newton iteration

Parameter embedding

4.7 ms

Nonlinear systems of equations

5.0 μs

Automatic differentiation

4.4 μs

Dual numbers

4.5 μs

We all know the field of complex numbers $C$ : they extend the real numbers $R$ based on the introduction ot $i$ with $i^{2} = - 1$ .

5.5 μs

Dual numbers are defined by extending the real numbers by formally adding an number $ε$ with $ε^{2} = 0$ :

$D = {a + b ε | a, b \in R} = {(\begin{matrix} a & b \\ 0 & a \end{matrix}) | a, b \in R} \subset R^{2 \times 2}$

They form a ring, not a field.

5.8 μs

Evaluating polynomials on dual numbers: Let $p (x) = \sum_{i = 0}^{n} p_{i} x^{i}$ . Then

$\begin{aligned} p (a + b ε) & = \sum_{i = 0}^{n} p_{i} a^{i} + \sum_{i = 1}^{n} i p_{i} a^{i - 1} b ε \\ = p (a) + b p^{'} (a) ε \end{aligned}$

This can be generalized to any analytical function. $\Rightarrow$ automatic evaluation of function and derivative at once
$\Rightarrow$ forward mode automatic differentiation
Multivariate dual numbers: generalization for partial derivatives

2.1 ms

Dual numbers in Julia

3.6 μs

Constructing a dual number:

2.5 μs

2 + 1ɛ

xxxxxxxxxx
 
d=Dual(2,1)

3.2 ms

Accessing its components:

2.2 μs

xxxxxxxxxx
 
d.value,d.epsilon

221 ns

Comparison with known derivative:

2.2 μs

testdual (generic function with 1 method)

xxxxxxxxxx
 
function testdual(x,f,df)
    xdual=Dual(x,1)
    fdual=f(xdual)'
    (f=f(x),f_dual=fdual.value),(df=df(x),df_dual=fdual.epsilon)
end

23.8 μs

Polynomial expressions:

2.2 μs

p (generic function with 1 method)

xxxxxxxxxx
 
p(x)=x^3+2x+1

21.1 μs

dp (generic function with 1 method)

xxxxxxxxxx
 
dp(x)=3x^2+2

17.2 μs

34.0

f_dual

34.0

2df

29.0

df_dual

29.0

xxxxxxxxxx
 
testdual(3.0,p,dp)

13.8 ms

Standard functions:

2.4 μs

-0.544021

f_dual

-0.544021

2df

-0.839072

df_dual

-0.839072

xxxxxxxxxx
 
testdual(10,sin,cos)

82.6 ms

2.56495

f_dual

2.56495

2df

0.0769231

df_dual

0.0769231

xxxxxxxxxx
 
testdual(13,log, x->1/x)

6.6 ms

Function composition:

2.2 μs

-0.506366

f_dual

-0.506366

2df

17.2464

df_dual

17.2464

xxxxxxxxxx
 
testdual(10,x->sin(x^2),x->2x*cos(x^2))

8.2 ms

Conclusion: if we apply dual numbers in the right way, we can do calculations with derivatives of complicated nonlinear expressions without the need to write code to calculate derivatives.

3.8 μs

The forwardiff package provides these facilities.

4.3 μs

testdual1 (generic function with 1 method)

xxxxxxxxxx
 
function testdual1(x,f,df)
    (f=f(x),df=df(x),df_dual=ForwardDiff.derivative(f,x))
end

17.9 μs

0.420167

0.907447

df_dual

0.907447

xxxxxxxxxx
 
testdual1(13,sin,cos)

63.5 ms

Let us plot some complicated function:

2.3 μs

g (generic function with 1 method)

xxxxxxxxxx
 
g(x)=sin(exp(0.2*x)+cos(3x))

16.9 μs

-5.0:0.01:5.0

xxxxxxxxxx
 
X=(-5:0.01:5)

6.4 μs

xxxxxxxxxx
 
let 
    clf()
    grid()
    plot(X,g.(X),label="g(x)")
    plot(X,ForwardDiff.derivative.(g,X), label="g'(x)")
    legend()
    gcf().set_size_inches(5,3)
    gcf()
end

724 ms

Solving nonlinear systems of equations

2.3 μs

Let $A_{1} \dots A_{n}$ be functions depending on $n$ unknowns $u_{1} \dots u_{n}$ . Solve the system of nonlinear equations:

$A (u) = (\begin{matrix} A_{1} (u_{1} \dots u_{n}) \\ A_{2} (u_{1} \dots u_{n}) \\ ⋮ \\ A_{n} (u_{1} \dots u_{n}) \end{matrix}) = (\begin{matrix} f_{1} \\ f_{2} \\ ⋮ \\ f_{n} \end{matrix}) = f$

$A (u)$ can be seen as a nonlinar operator $A : D \to R^{n}$ where $D \subset R^{n}$ is its domain of definition.

There is no analogon to Gaussian elimination, so we need to solve iteratively.

3.8 μs

Fixpoint iteration scheme:

Assume $A (u) = M (u) u$ where for each $u$ , $M (u) : R^{n} \to R^{n}$ is a linear operator.

Then we can define the iteration scheme: choose an initial value $u_{0}$ and at each iteration step, solve

$M (u^{i}) u^{i + 1} = f$

Terminate if

$| | A (u^{i}) - f | | < ε (residual based)$

$| | u_{i + 1} - u_{i} | | < ε (update based) .$

Large domain of convergence
Convergence may be slow
Smooth coefficients not necessary

8.4 μs

fixpoint! (generic function with 1 method)

xxxxxxxxxx
 
function fixpoint!(u,M,f, imax, tol)
    history=Float64[]
    for i=1:imax
        res=norm(M(u)*u-f)
        push!(history,res)
        if res<tol 
            return u,history
        end
        u=M(u)\f
    end
    error("No convergence after $imax iterations")
end
​

30.9 μs

Example problem

2.4 μs

M (generic function with 1 method)

xxxxxxxxxx
 
function M(u)
    [ 0.1+(u[1]^2+u[2]^2)  -(u[1]^2+u[2]^2);
        -(u[1]^2+u[2]^2)  0.1+(u[1]^2+u[2]^2)]
end

29.5 μs

Int641

xxxxxxxxxx
 
F=[1,3]

1.4 μs

1Float641

19.9994

20.0006

2Float641

3.16228

28284.3

0.282829

4.95196e-10

1.81899e-12

0.0

xxxxxxxxxx
 
fixpt_result,fixpt_history=fixpoint!([0,0],M,F,100,1.0e-12)

1.2 s

contraction (generic function with 1 method)

xxxxxxxxxx
 
contraction(h)=h[2:end]./h[1:end-1]

20.1 μs

plothistory (generic function with 1 method)

xxxxxxxxxx
 
function plothistory(history)
    clf()
    semilogy(history)
    grid()
    gcf()
end

16.8 μs

Float641

8944.27

9.9995e-6

1.75087e-9

0.00367327

0.0

xxxxxxxxxx
 
contraction(fixpt_history)

1.4 μs

xxxxxxxxxx
 
plothistory(fixpt_history)

23.5 ms

Float641

0.0

0.0

xxxxxxxxxx
 
M(fixpt_result)*fixpt_result-F

3.7 μs

Newton iteration scheme

The fixed point iteration scheme assumes a particular structure of the nonlinear system. Can we do better ?

Let $A^{'} (u)$ be the Jacobi matrix of first partial derivatives of $A$ at point $u$ :

$A^{'} (u) = (a_{k l})$

'with

$a_{k l} = \frac{\partial}{\partial u_{l}} A_{k} (u_{1} \dots u_{n})$

The one calculates in the $i$ -th iteration step:

$u_{i + 1} = u_{i} - (A^{'} (u_{i}))^{- 1} (A (u_{i}) - f)$

One can split this a folows:

Calculate residual: $r_{i} = A (u_{i}) - f$
Solve linear system for update: $A^{'} (u_{i}) h_{i} = r_{i}$
Update solution: $u_{i + 1} = u_{i} - h_{i}$

General properties are:

Potenially small domain of convergence - one needs a good initial value
Possibly slow initial convergence
Quadratic convergence close to the solution

12.6 μs

Linear and quadratic convergence

Let $e_{i} = u_{i} - \hat{u}$ .

Linear convergence: observed for e.g. linear systems: Asymptically constant error contraction rate

$\frac{| | e_{i + 1} | |}{| | e_{i} | |} \sim ρ < 1$

Quadratic convergence: $\exists i_{0} > 0$ such that $\forall i > i_{0}$ , $\frac{| | e_{i + 1} | |}{| | e_{i} | |^{2}} \leq M < 1.$
- As $| | e_{i} | |$ decreases, the contraction rate decreases:

$\begin{aligned} \frac{\frac{| | e_{i + 1} | |}{| | e_{i} | |}}{\frac{| | e_{i} | |}{| | e_{i - 1} | |}} & = \frac{| | e_{i + 1} | |}{\frac{| | e_{i} | |^{2}}{| | e_{i - 1} | |}} \leq | | e_{i - 1} | | M \end{aligned}$

In practice, we can watch $| | r_{i} | |$ or $| | h_{i} | |$

8.1 μs

Automatic differentiation for Newton's method

2.2 μs

This is the situation where we could apply automatic differentiation for vector functions of vectors.

2.3 μs

A (generic function with 1 method)

xxxxxxxxxx
 
A(u)=M(u)*u

14.7 μs

Create a result buffer for $n = 2$

2.4 μs

dresult

MutableDiffResult([5.0e-324, 5.0e-324], ([6.91366206214077e-310 6.91366204885713e-310; 6.91366206214235e-310 6.91366042732735e-310],))

xxxxxxxxxx
 
dresult=DiffResults.JacobianResult(ones(2))

5.9 ms

Calculate function and derivative at once:

2.4 μs

MutableDiffResult([0.1999999999999993, 0.1999999999999993], ([8.100000000000001 -8.0; -8.0 8.100000000000001],))

xxxxxxxxxx
 
ForwardDiff.jacobian!(dresult,A,[2.0, 2.0])

949 ms

Float641

0.2

0.2

xxxxxxxxxx
 
DiffResults.value(dresult)

1.1 ms

2×2 Array{Float64,2}:
  8.1  -8.0
 -8.0   8.1

xxxxxxxxxx
 
 DiffResults.jacobian(dresult)

1.2 ms

A Newton solver with automatic differentiation

2.3 μs

newton (generic function with 1 method)

xxxxxxxxxx
 
function newton(A,b,u0; tol=1.0e-12, maxit=100)
    result=DiffResults.JacobianResult(u0)
    history=Float64[]
    u=copy(u0)
    it=1
    while it<maxit
        ForwardDiff.jacobian!(result,(v)->A(v)-b ,u)
        res=DiffResults.value(result)
        jac=DiffResults.jacobian(result)
        h=jac\res
        u-=h
        nm=norm(h)
        push!(history,nm)
        if nm<tol
            return u,history
        end
​
        it=it+1
    end
    throw("convergence failed")
end

113 μs

1Float641

19.9994

20.0006

2Float641

28.8467

5.58664

0.493295

0.000301159

3.69765e-13

1.28622e-11

1.60767e-15

xxxxxxxxxx
 
newton_result,newton_history=newton(A,F,[0,0.1],tol=1.e-13)

635 ms

Float641

0.193667

0.0882991

0.000610505

1.22781e-9

34.7848

0.000124992

xxxxxxxxxx
 
contraction(newton_history)

999 ns

xxxxxxxxxx
 
plothistory(newton_history)

22.2 ms

Float641

1.81899e-12

-1.81899e-12

xxxxxxxxxx
 
A(newton_result)-F

4.0 μs

Let us take a more complicated example:

2.2 μs

A2 (generic function with 1 method)

xxxxxxxxxx
 
A2(x)= [x[1]+x[1]^5+3*x[2]*x[3], 
        0.1*x[2]+x[2]^5-3*x[1]-x[3], 
        x[3]^5+x[1]*x[2]*x[3]]

26.1 μs

Float641

0.1

0.1

0.1

xxxxxxxxxx
 
F2=[0.1,0.1,0.1]

782 ns

U02

Float641

1.0

1.0

1.0

xxxxxxxxxx
 
U02=[1,1.0,1.0]

11.8 ms

1Float641

-0.248731

0.175566

0.663915

2Float641

0.796625

4.90091

27.5487

5.62444

4.49756

3.59886

2.88249

2.31732

1.93716

1.62794

1.23389

1.26897

0.912283

1.2057

0.608024

0.819861

0.686672

0.763308

0.669681

3.8059

more54

0.579875

0.503141

0.538473

0.992612

0.421726

0.136995

0.0144148

0.00042281

4.77938e-7

6.61053e-13

xxxxxxxxxx
 
res2,hist2=newton(A2,F2,U02)

445 ms

Float641

0.0

-1.38778e-16

-1.38778e-17

xxxxxxxxxx
 
A2(res2)-F2

3.1 μs

2Float641

6.15208

5.62115

0.204163

0.799647

0.80018

0.800945

0.803928

0.83595

0.840376

0.757943

1.02843

0.718918

1.32163

0.504291

1.3484

0.837547

1.1116

0.87734

5.68315

0.246733

more53

0.817058

0.86767

1.07022

1.84338

0.424865

0.324844

0.105221

0.0293317

0.00113039

1.38313e-6

xxxxxxxxxx
 
length(hist2),contraction(hist2)

1.5 μs

xxxxxxxxxx
 
plothistory(hist2)

24.8 ms

Here, we observe that we have to use lots of iteration steps and see a rather erratic behaviour of the residual. After $\approx$ 55 steps we arrive in the quadratic convergence region where convergence is fast.

2.4 μs

Damped Newton iteration

There are may ways to improve the convergence behaviour and/or to increase the convergence radius in such a case. The simplest ones are:

find a good estimate of the initial value
damping: do not use the full update, but damp it by some factor which we increase during the iteration process
linesearch: automatic detection of a damping factor

5.5 μs

dnewton (generic function with 1 method)

xxxxxxxxxx
 
function dnewton(A,b,u0; tol=1.0e-12,maxit=100,damp=0.01,damp_growth=1)
    result=DiffResults.JacobianResult(u0)
    history=Float64[]
    u=copy(u0)
    it=1
    while it<maxit
        ForwardDiff.jacobian!(result,(v)->A(v)-b ,u)
        res=DiffResults.value(result)
        jac=DiffResults.jacobian(result)
        h=jac\res
        u-=damp*h
        nm=norm(h)
        push!(history,nm)
        if nm<tol
            return u,history
        end
  
        it=it+1
        damp=min(damp*damp_growth,1.0)
    end
    throw("convergence failed")
end

78.3 μs

1Float641

-0.248731

0.175566

0.663915

2Float641

0.796625

1.62137

0.572359

0.326425

0.178438

0.0805738

0.0268168

0.00467128

0.000174043

8.16876e-8

2.00594e-14

xxxxxxxxxx
 
res3,hist3=dnewton(A2,F2,U02,damp=0.5,damp_growth=1.1)

433 ms

2Float641

2.0353

0.35301

0.570316

0.546644

0.45155

0.332823

0.174192

0.037258

0.000469354

2.45563e-7

xxxxxxxxxx
 
length(hist3),contraction(hist3)

1.5 μs

xxxxxxxxxx
 
plothistory(hist3)

23.5 ms

Float641

-2.77556e-17

-2.77556e-17

-1.38778e-17

xxxxxxxxxx
 
A2(res3)-F2

2.9 μs

The example shows: damping indeed helps to improve the convergece behaviour. However, if we keep the damping parameter less than 1, we loose the quadratic convergence behavior.

2.2 μs

Parameter embedding

2.3 μs

Another option is the use of parameter embedding for parameter dependent problems.

Problem: solve $A (u_{λ}, λ) = f$ for $λ = 1$ .
Assume $A (u_{0}, 0)$ can be easily solved.
Choose step size $δ$

Solve $A (u_{0}, 0) = f$
Set $λ = 0$
Solve $A (u_{λ + δ}, λ + δ) = f$ with initial value $u_{λ}$
Set $λ = λ + δ$
If $λ < 1$ repeat with 3.

If $δ$ is small enough, we can ensure that $u_{λ}$ is a good initial value for $u_{λ + δ}$ .
Possibility to adapt $δ$ depending on Newton convergence
Parameter embedding + damping + update based convergence control go a long way to solve even strongly nonlinear problems!
As we will see later, a similar apporach can be used for time dependent problems.

11.2 μs