truexxxxxxxxxxbegin ENV["LANG"]="C" using Pkg Pkg.activate(mktempdir()) Pkg.add(["PyPlot","PlutoUI","DualNumbers","ForwardDiff","DiffResults"]) using PlutoUI using PyPlot using DualNumbers using LinearAlgebra using ForwardDiff using DiffResults PyPlot.svg(true)endContents
Nonlinear systems of equations
Automatic differentiation
Dual numbers
We all know the field of complex numbers
Dual numbers are defined by extending the real numbers by formally adding an number
They form a ring, not a field.
Evaluating polynomials on dual numbers: Let
. Then
This can be generalized to any analytical function.
automatic evaluation of function and derivative at once forward mode automatic differentiationMultivariate dual numbers: generalization for partial derivatives
Dual numbers in Julia
Constructing a dual number:
2 + 1ɛxxxxxxxxxxd=Dual(2,1)Accessing its components:
2
1
xxxxxxxxxxd.value,d.epsilonComparison with known derivative:
testdual (generic function with 1 method)xxxxxxxxxxfunction testdual(x,f,df) xdual=Dual(x,1) fdual=f(xdual)' (f=f(x),f_dual=fdual.value),(df=df(x),df_dual=fdual.epsilon)endPolynomial expressions:
p (generic function with 1 method)xxxxxxxxxxp(x)=x^3+2x+1dp (generic function with 1 method)xxxxxxxxxxdp(x)=3x^2+234.0
34.0
29.0
29.0
xxxxxxxxxxtestdual(3.0,p,dp)Standard functions:
-0.544021
-0.544021
-0.839072
-0.839072
xxxxxxxxxxtestdual(10,sin,cos)2.56495
2.56495
0.0769231
0.0769231
xxxxxxxxxxtestdual(13,log, x->1/x)Function composition:
-0.506366
-0.506366
17.2464
17.2464
xxxxxxxxxxtestdual(10,x->sin(x^2),x->2x*cos(x^2))Conclusion: if we apply dual numbers in the right way, we can do calculations with derivatives of complicated nonlinear expressions without the need to write code to calculate derivatives.
The forwardiff package provides these facilities.
testdual1 (generic function with 1 method)xxxxxxxxxxfunction testdual1(x,f,df) (f=f(x),df=df(x),df_dual=ForwardDiff.derivative(f,x))end0.420167
0.907447
0.907447
xxxxxxxxxxtestdual1(13,sin,cos)Let us plot some complicated function:
g (generic function with 1 method)xxxxxxxxxxg(x)=sin(exp(0.2*x)+cos(3x))-5.0:0.01:5.0xxxxxxxxxxX=(-5:0.01:5)xxxxxxxxxxlet clf() grid() plot(X,g.(X),label="g(x)") plot(X,ForwardDiff.derivative.(g,X), label="g'(x)") legend() gcf().set_size_inches(5,3) gcf()endSolving nonlinear systems of equations
Let
There is no analogon to Gaussian elimination, so we need to solve iteratively.
Fixpoint iteration scheme:
Assume
Then we can define the iteration scheme: choose an initial value
Terminate if
or
Large domain of convergence
Convergence may be slow
Smooth coefficients not necessary
fixpoint! (generic function with 1 method)xxxxxxxxxxfunction fixpoint!(u,M,f, imax, tol) history=Float64[] for i=1:imax res=norm(M(u)*u-f) push!(history,res) if res<tol return u,history end u=M(u)\f end error("No convergence after $imax iterations")endExample problem
M (generic function with 1 method)xxxxxxxxxxfunction M(u) [ 0.1+(u[1]^2+u[2]^2) -(u[1]^2+u[2]^2); -(u[1]^2+u[2]^2) 0.1+(u[1]^2+u[2]^2)]end1
3
xxxxxxxxxxF=[1,3]19.9994
20.0006
3.16228
28284.3
0.282829
4.95196e-10
1.81899e-12
0.0
xxxxxxxxxxfixpt_result,fixpt_history=fixpoint!([0,0],M,F,100,1.0e-12)contraction (generic function with 1 method)xxxxxxxxxxcontraction(h)=h[2:end]./h[1:end-1]plothistory (generic function with 1 method)xxxxxxxxxxfunction plothistory(history) clf() semilogy(history) grid() gcf()end8944.27
9.9995e-6
1.75087e-9
0.00367327
0.0
xxxxxxxxxxcontraction(fixpt_history)xxxxxxxxxxplothistory(fixpt_history)0.0
0.0
xxxxxxxxxxM(fixpt_result)*fixpt_result-FNewton iteration scheme
The fixed point iteration scheme assumes a particular structure of the nonlinear system. Can we do better ?
Let
'with
The one calculates in the
One can split this a folows:
Calculate residual:
Solve linear system for update:
Update solution:
General properties are:
Potenially small domain of convergence - one needs a good initial value
Possibly slow initial convergence
Quadratic convergence close to the solution
Linear and quadratic convergence
Let
Linear convergence: observed for e.g. linear systems: Asymptically constant error contraction rate
Quadratic convergence:
such that ,As
decreases, the contraction rate decreases:
In practice, we can watch
or
Automatic differentiation for Newton's method
This is the situation where we could apply automatic differentiation for vector functions of vectors.
A (generic function with 1 method)xxxxxxxxxxA(u)=M(u)*uCreate a result buffer for
MutableDiffResult([5.0e-324, 5.0e-324], ([6.91366206214077e-310 6.91366204885713e-310; 6.91366206214235e-310 6.91366042732735e-310],))xxxxxxxxxxdresult=DiffResults.JacobianResult(ones(2))Calculate function and derivative at once:
MutableDiffResult([0.1999999999999993, 0.1999999999999993], ([8.100000000000001 -8.0; -8.0 8.100000000000001],))xxxxxxxxxxForwardDiff.jacobian!(dresult,A,[2.0, 2.0])0.2
0.2
xxxxxxxxxxDiffResults.value(dresult)2×2 Array{Float64,2}:
8.1 -8.0
-8.0 8.1xxxxxxxxxx DiffResults.jacobian(dresult)A Newton solver with automatic differentiation
newton (generic function with 1 method)xxxxxxxxxxfunction newton(A,b,u0; tol=1.0e-12, maxit=100) result=DiffResults.JacobianResult(u0) history=Float64[] u=copy(u0) it=1 while it<maxit ForwardDiff.jacobian!(result,(v)->A(v)-b ,u) res=DiffResults.value(result) jac=DiffResults.jacobian(result) h=jac\res u-=h nm=norm(h) push!(history,nm) if nm<tol return u,history end it=it+1 end throw("convergence failed")end19.9994
20.0006
28.8467
5.58664
0.493295
0.000301159
3.69765e-13
1.28622e-11
1.60767e-15
xxxxxxxxxxnewton_result,newton_history=newton(A,F,[0,0.1],tol=1.e-13)0.193667
0.0882991
0.000610505
1.22781e-9
34.7848
0.000124992
xxxxxxxxxxcontraction(newton_history)xxxxxxxxxxplothistory(newton_history)1.81899e-12
-1.81899e-12
xxxxxxxxxxA(newton_result)-FLet us take a more complicated example:
A2 (generic function with 1 method)xxxxxxxxxxA2(x)= [x[1]+x[1]^5+3*x[2]*x[3], 0.1*x[2]+x[2]^5-3*x[1]-x[3], x[3]^5+x[1]*x[2]*x[3]]0.1
0.1
0.1
xxxxxxxxxxF2=[0.1,0.1,0.1]1.0
1.0
1.0
xxxxxxxxxxU02=[1,1.0,1.0]-0.248731
0.175566
0.663915
0.796625
4.90091
27.5487
5.62444
4.49756
3.59886
2.88249
2.31732
1.93716
1.62794
1.23389
1.26897
0.912283
1.2057
0.608024
0.819861
0.686672
0.763308
0.669681
3.8059
0.579875
0.503141
0.538473
0.992612
0.421726
0.136995
0.0144148
0.00042281
4.77938e-7
6.61053e-13
xxxxxxxxxxres2,hist2=newton(A2,F2,U02)0.0
-1.38778e-16
-1.38778e-17
xxxxxxxxxxA2(res2)-F263
6.15208
5.62115
0.204163
0.799647
0.80018
0.800945
0.803928
0.83595
0.840376
0.757943
1.02843
0.718918
1.32163
0.504291
1.3484
0.837547
1.1116
0.87734
5.68315
0.246733
0.817058
0.86767
1.07022
1.84338
0.424865
0.324844
0.105221
0.0293317
0.00113039
1.38313e-6
xxxxxxxxxxlength(hist2),contraction(hist2)xxxxxxxxxxplothistory(hist2)Here, we observe that we have to use lots of iteration steps and see a rather erratic behaviour of the residual. After
Damped Newton iteration
There are may ways to improve the convergence behaviour and/or to increase the convergence radius in such a case. The simplest ones are:
find a good estimate of the initial value
damping: do not use the full update, but damp it by some factor which we increase during the iteration process
linesearch: automatic detection of a damping factor
dnewton (generic function with 1 method)xxxxxxxxxxfunction dnewton(A,b,u0; tol=1.0e-12,maxit=100,damp=0.01,damp_growth=1) result=DiffResults.JacobianResult(u0) history=Float64[] u=copy(u0) it=1 while it<maxit ForwardDiff.jacobian!(result,(v)->A(v)-b ,u) res=DiffResults.value(result) jac=DiffResults.jacobian(result) h=jac\res u-=damp*h nm=norm(h) push!(history,nm) if nm<tol return u,history end it=it+1 damp=min(damp*damp_growth,1.0) end throw("convergence failed")end-0.248731
0.175566
0.663915
0.796625
1.62137
0.572359
0.326425
0.178438
0.0805738
0.0268168
0.00467128
0.000174043
8.16876e-8
2.00594e-14
xxxxxxxxxxres3,hist3=dnewton(A2,F2,U02,damp=0.5,damp_growth=1.1)11
2.0353
0.35301
0.570316
0.546644
0.45155
0.332823
0.174192
0.037258
0.000469354
2.45563e-7
xxxxxxxxxxlength(hist3),contraction(hist3)xxxxxxxxxxplothistory(hist3)-2.77556e-17
-2.77556e-17
-1.38778e-17
xxxxxxxxxxA2(res3)-F2The example shows: damping indeed helps to improve the convergece behaviour. However, if we keep the damping parameter less than 1, we loose the quadratic convergence behavior.
Parameter embedding
Another option is the use of parameter embedding for parameter dependent problems.
Problem: solve
for .Assume
can be easily solved.Choose step size
Solve
Set
Solve
with initial valueSet
If
repeat with 3.
If
is small enough, we can ensure that is a good initial value for .Possibility to adapt
depending on Newton convergenceParameter embedding + damping + update based convergence control go a long way to solve even strongly nonlinear problems!
As we will see later, a similar apporach can be used for time dependent problems.