truexxxxxxxxxxbegin     ENV["LANG"]="C"    using Pkg    Pkg.activate(mktempdir())    Pkg.add(["PyPlot","PlutoUI","DualNumbers","ForwardDiff","DiffResults"])    using PlutoUI    using PyPlot    using DualNumbers    using LinearAlgebra    using ForwardDiff    using DiffResults    PyPlot.svg(true)endContents
Nonlinear systems of equations
Automatic differentiation
Dual numbers
We all know the field of  complex numbers 
Dual numbers are defined by extending the real numbers by formally adding an number  
They form a ring, not a field.
- Evaluating polynomials on dual numbers: Let 
- This can be generalized to any analytical function. - Multivariate dual numbers: generalization for partial derivatives 
Dual numbers in Julia
Constructing a dual number:
2 + 1ɛxxxxxxxxxxd=Dual(2,1)Accessing its components:
2
1
xxxxxxxxxxd.value,d.epsilonComparison with known derivative:
testdual (generic function with 1 method)xxxxxxxxxxfunction testdual(x,f,df)    xdual=Dual(x,1)    fdual=f(xdual)'    (f=f(x),f_dual=fdual.value),(df=df(x),df_dual=fdual.epsilon)endPolynomial expressions:
p (generic function with 1 method)xxxxxxxxxxp(x)=x^3+2x+1dp (generic function with 1 method)xxxxxxxxxxdp(x)=3x^2+234.0
34.0
29.0
29.0
xxxxxxxxxxtestdual(3.0,p,dp)Standard functions:
-0.544021
-0.544021
-0.839072
-0.839072
xxxxxxxxxxtestdual(10,sin,cos)2.56495
2.56495
0.0769231
0.0769231
xxxxxxxxxxtestdual(13,log, x->1/x)Function composition:
-0.506366
-0.506366
17.2464
17.2464
xxxxxxxxxxtestdual(10,x->sin(x^2),x->2x*cos(x^2))Conclusion: if we apply dual numbers in the right way, we can do calculations with derivatives of complicated nonlinear expressions without the need to write code to calculate derivatives.
The forwardiff package provides these facilities.
testdual1 (generic function with 1 method)xxxxxxxxxxfunction testdual1(x,f,df)    (f=f(x),df=df(x),df_dual=ForwardDiff.derivative(f,x))end0.420167
0.907447
0.907447
xxxxxxxxxxtestdual1(13,sin,cos)Let us plot some complicated function:
g (generic function with 1 method)xxxxxxxxxxg(x)=sin(exp(0.2*x)+cos(3x))-5.0:0.01:5.0xxxxxxxxxxX=(-5:0.01:5)xxxxxxxxxxlet     clf()    grid()    plot(X,g.(X),label="g(x)")    plot(X,ForwardDiff.derivative.(g,X), label="g'(x)")    legend()    gcf().set_size_inches(5,3)    gcf()endSolving nonlinear systems of equations
Let 
There is no analogon to Gaussian elimination, so we need to solve iteratively.
Fixpoint iteration scheme:
Assume 
Then we can define the iteration scheme: choose an initial value 
Terminate if
or
- Large domain of convergence 
- Convergence may be slow 
- Smooth coefficients not necessary 
fixpoint! (generic function with 1 method)xxxxxxxxxxfunction fixpoint!(u,M,f, imax, tol)    history=Float64[]    for i=1:imax        res=norm(M(u)*u-f)        push!(history,res)        if res<tol             return u,history        end        u=M(u)\f    end    error("No convergence after $imax iterations")endExample problem
M (generic function with 1 method)xxxxxxxxxxfunction M(u)    [ 0.1+(u[1]^2+u[2]^2)  -(u[1]^2+u[2]^2);        -(u[1]^2+u[2]^2)  0.1+(u[1]^2+u[2]^2)]end1
3
xxxxxxxxxxF=[1,3]19.9994
20.0006
3.16228
28284.3
0.282829
4.95196e-10
1.81899e-12
0.0
xxxxxxxxxxfixpt_result,fixpt_history=fixpoint!([0,0],M,F,100,1.0e-12)contraction (generic function with 1 method)xxxxxxxxxxcontraction(h)=h[2:end]./h[1:end-1]plothistory (generic function with 1 method)xxxxxxxxxxfunction plothistory(history)    clf()    semilogy(history)    grid()    gcf()end8944.27
9.9995e-6
1.75087e-9
0.00367327
0.0
xxxxxxxxxxcontraction(fixpt_history)xxxxxxxxxxplothistory(fixpt_history)0.0
0.0
xxxxxxxxxxM(fixpt_result)*fixpt_result-FNewton iteration scheme
The fixed point iteration scheme assumes a particular structure of the nonlinear system. Can we do better ?
Let 
'with
The one calculates in the 
One can split this a folows:
- Calculate residual: 
- Solve linear system for update: 
- Update solution: 
General properties are:
- Potenially small domain of convergence - one needs a good initial value 
- Possibly slow initial convergence 
- Quadratic convergence close to the solution 
Linear and quadratic convergence
Let 
- Linear convergence: observed for e.g. linear systems: Asymptically constant error contraction rate 
- Quadratic convergence: - As 
 
- In practice, we can watch 
Automatic differentiation for Newton's method
This is the situation where we could apply automatic differentiation for vector functions of vectors.
A (generic function with 1 method)xxxxxxxxxxA(u)=M(u)*uCreate a result buffer for 
MutableDiffResult([5.0e-324, 5.0e-324], ([6.91366206214077e-310 6.91366204885713e-310; 6.91366206214235e-310 6.91366042732735e-310],))xxxxxxxxxxdresult=DiffResults.JacobianResult(ones(2))Calculate function and derivative at once:
MutableDiffResult([0.1999999999999993, 0.1999999999999993], ([8.100000000000001 -8.0; -8.0 8.100000000000001],))xxxxxxxxxxForwardDiff.jacobian!(dresult,A,[2.0, 2.0])0.2
0.2
xxxxxxxxxxDiffResults.value(dresult)2×2 Array{Float64,2}:
  8.1  -8.0
 -8.0   8.1xxxxxxxxxx DiffResults.jacobian(dresult)A Newton solver with automatic differentiation
newton (generic function with 1 method)xxxxxxxxxxfunction newton(A,b,u0; tol=1.0e-12, maxit=100)    result=DiffResults.JacobianResult(u0)    history=Float64[]    u=copy(u0)    it=1    while it<maxit        ForwardDiff.jacobian!(result,(v)->A(v)-b ,u)        res=DiffResults.value(result)        jac=DiffResults.jacobian(result)        h=jac\res        u-=h        nm=norm(h)        push!(history,nm)        if nm<tol            return u,history        end        it=it+1    end    throw("convergence failed")end19.9994
20.0006
28.8467
5.58664
0.493295
0.000301159
3.69765e-13
1.28622e-11
1.60767e-15
xxxxxxxxxxnewton_result,newton_history=newton(A,F,[0,0.1],tol=1.e-13)0.193667
0.0882991
0.000610505
1.22781e-9
34.7848
0.000124992
xxxxxxxxxxcontraction(newton_history)xxxxxxxxxxplothistory(newton_history)1.81899e-12
-1.81899e-12
xxxxxxxxxxA(newton_result)-FLet us take a more complicated example:
A2 (generic function with 1 method)xxxxxxxxxxA2(x)= [x[1]+x[1]^5+3*x[2]*x[3],         0.1*x[2]+x[2]^5-3*x[1]-x[3],         x[3]^5+x[1]*x[2]*x[3]]0.1
0.1
0.1
xxxxxxxxxxF2=[0.1,0.1,0.1]1.0
1.0
1.0
xxxxxxxxxxU02=[1,1.0,1.0]-0.248731
0.175566
0.663915
0.796625
4.90091
27.5487
5.62444
4.49756
3.59886
2.88249
2.31732
1.93716
1.62794
1.23389
1.26897
0.912283
1.2057
0.608024
0.819861
0.686672
0.763308
0.669681
3.8059
0.579875
0.503141
0.538473
0.992612
0.421726
0.136995
0.0144148
0.00042281
4.77938e-7
6.61053e-13
xxxxxxxxxxres2,hist2=newton(A2,F2,U02)0.0
-1.38778e-16
-1.38778e-17
xxxxxxxxxxA2(res2)-F263
6.15208
5.62115
0.204163
0.799647
0.80018
0.800945
0.803928
0.83595
0.840376
0.757943
1.02843
0.718918
1.32163
0.504291
1.3484
0.837547
1.1116
0.87734
5.68315
0.246733
0.817058
0.86767
1.07022
1.84338
0.424865
0.324844
0.105221
0.0293317
0.00113039
1.38313e-6
xxxxxxxxxxlength(hist2),contraction(hist2)xxxxxxxxxxplothistory(hist2)Here, we observe that we have to use lots of iteration steps and see a rather erratic behaviour of the residual. After 
Damped Newton iteration
There are may ways to improve the convergence behaviour and/or to increase the convergence radius in such a case. The simplest ones are:
- find a good estimate of the initial value 
- damping: do not use the full update, but damp it by some factor which we increase during the iteration process 
- linesearch: automatic detection of a damping factor 
dnewton (generic function with 1 method)xxxxxxxxxxfunction dnewton(A,b,u0; tol=1.0e-12,maxit=100,damp=0.01,damp_growth=1)    result=DiffResults.JacobianResult(u0)    history=Float64[]    u=copy(u0)    it=1    while it<maxit        ForwardDiff.jacobian!(result,(v)->A(v)-b ,u)        res=DiffResults.value(result)        jac=DiffResults.jacobian(result)        h=jac\res        u-=damp*h        nm=norm(h)        push!(history,nm)        if nm<tol            return u,history        end          it=it+1        damp=min(damp*damp_growth,1.0)    end    throw("convergence failed")end-0.248731
0.175566
0.663915
0.796625
1.62137
0.572359
0.326425
0.178438
0.0805738
0.0268168
0.00467128
0.000174043
8.16876e-8
2.00594e-14
xxxxxxxxxxres3,hist3=dnewton(A2,F2,U02,damp=0.5,damp_growth=1.1)11
2.0353
0.35301
0.570316
0.546644
0.45155
0.332823
0.174192
0.037258
0.000469354
2.45563e-7
xxxxxxxxxxlength(hist3),contraction(hist3)xxxxxxxxxxplothistory(hist3)-2.77556e-17
-2.77556e-17
-1.38778e-17
xxxxxxxxxxA2(res3)-F2The example shows: damping indeed helps to improve the convergece behaviour. However, if we keep the damping parameter less than 1, we loose the quadratic convergence behavior.
Parameter embedding
Another option is the use of parameter embedding for parameter dependent problems.
- Problem: solve - Assume - Choose step size 
- Solve 
- Set 
- Solve - Set 
- If 
- If - Possibility to adapt - Parameter embedding + damping + update based convergence control go a long way to solve even strongly nonlinear problems! 
- As we will see later, a similar apporach can be used for time dependent problems.