6.4 s
189 μs

Linear System Solution

  • Let A be an N×N matrix, bRN.

  • Solve Ax=b

Direct methods:

  • Exact up to machine precision

  • Sometimes expensive, sometimes not

Iterative methods:

  • "Only" approximate

  • With good convergence and proper accuracy control, results may be not worse than for direct methods

  • Sometimes expensive, sometimes not

  • Convergence guarantee is problem dependent and can be tricky

13.3 μs

Matrix & vector norms

Vector norms

let x=(xi)Rn

4.5 μs
2.7 μs
  • ||x||1=i=1n|xi|: sum norm, l1-norm

4.2 μs
1.5
75.3 ms
  • ||x||2=i=1nxi2: Euclidean norm, l2-norm

3.9 μs
2.2 ms
  • ||x||=maxi=1n|xi|: maximum norm, l-norm

4.0 μs
1.0
91.0 ms

Matrix norms

Matrix A=(aij)Rn×Rn

  • Representation of linear operator A:RnRn defined by A:xy=Ax with yi=j=1naijxj

  • Vector norm ||||p induces corresponding matrix norm:

    ||A||p=maxxRn,x0||Ax||p||x||p=maxxRn,||x||p=1||Ax||p||x||p

6.5 μs
M
3×3 Array{Float64,2}:
 3.0  2.0  3.0
 0.1  0.3  0.5
 0.6  2.0  3.0
16.4 ms
  • ||A||1=maxj=1ni=1n|aij| maximum of column sums of absolute values of entries

3.2 μs
6.5
94.0 ms
  • ||A||2=λmax with λmax: largest eigenvalue of ATA.

4.4 μs
1.1 s
  • ||A||=maxi=1nj=1n|aij|=||AT||1 maximum of row sums of absolute values of entries

3.4 μs
112 ms

Condition number and error propagation

  • Solve Ax=b, where b is inexact

  • Let Δb be the error in b and Δx be the resulting error in x such that

    A(x+Δx)=b+Δb.

  • Since Ax=b, we get AΔx=Δb

  • Therefore Δx=A1Δb

    ||A||||x||||b||

    ||Δx||||A1||||Δb||

    ||Δx||||x||κ(A)||Δb||||b||

where κ(A)=||A||||A1|| is the condition number of A.

This means that the relative error in the solution is proportional to the relative error of the right hand side. The proportionality factor κ(A) is usually larger (and in most relevant cases significantly larger) than one. Just remark that this estimates does not assume inexact arithmetics.

1.5 ms

Let us have an example. We use rational arithmetics in order to perform exact calulations.

5.3 μs
T_test
Float64
1.0 μs
a
1.0e6
5.5 μs
pert_b
1.0e-9
16.9 ms
A
2×2 Array{Float64,2}:
 1.0    -1.0
 1.0e6   1.0e6
37.8 ms
κ
1.0e6
174 ms
2×2 Array{Float64,2}:
  0.5  5.0e-7
 -0.5  5.0e-7
9.6 μs

Assume a solution vector:

2.9 μs
x
2.4 μs

Create corresponding right hand side:

2.7 μs
b
80.4 ms

Define a perturbation of the right hand side:

2.5 μs
Δb
3.8 μs

Calculate the error with respect to the unperturbed solution:

2.4 μs
Δx
116 ms

Relative error of right hand side:

2.3 μs
δb
7.071067811865475e-16
7.2 μs

Relative error of solution:

2.2 μs
δx
5.000000413703827e-10
69.1 ms

Comparison with condition number based estimate:

2.3 μs
3.2 μs

Complexity: "big O notation"

Let f,g:VR+ be some functions, where V=N or V=R.

Write

f(x)=O(g(x))(x)

if there exists a constant C>0 and x0V such that x>x0,|f(x)|C|g(x)|

Often, one skips the part "(x)"

Examples:

  • Addition of two vectors: O(N)

  • Matrix-vector multiplication (for matrix where all entries are assumed to be nonzero): O(N2)

9.7 μs

A Direct method: Cramer's rule

Solve Ax=b by Cramer's rule:

xi=|a11a12a1i1b1a1i+1a1Na21b2a2NaN1bNaNN|/|A|(i=1N)

This takes O(N!) operations...

5.2 μs

LU decomposition

Gaussian elimination

So let us have a look at Gaussian elimination to solve Ax=b. The elementary matrix manipulation step in Gaussian elimination ist the multiplication of row k by ajk/akk and its addition to row j such that element jk in the resulting matrix becomes zero. If this is done at once for all j>k, we can express this operation as the left multiplication of A by a lower triangular Gauss transformation matrix M(A,k).

4.1 μs
27.6 μs

Define the number type for the following examples:

2.1 μs
T_lu
Rational
2.8 μs

Define a test matrix:

2.2 μs
A1
4×4 Array{Rational,2}:
 2//1  1//1  3//1  4//1
 5//1  6//1  7//1  8//1
 7//1  6//1  8//1  5//1
 3//1  4//1  2//1  2//1
24.8 ms

This is the Gauss transform for first column:

2.3 μs
4×4 Array{Rational{Int64},2}:
  1//1  0//1  0//1  0//1
 -5//2  1//1  0//1  0//1
 -7//2  0//1  1//1  0//1
 -3//2  0//1  0//1  1//1
71.1 ms

Applying it then sets all elements in the first column to zero besides of the main diagonal element:

2.2 μs
U1
4×4 Array{Rational,2}:
 2//1  1//1   3//1   4//1
 0//1  7//2  -1//2  -2//1
 0//1  5//2  -5//2  -9//1
 0//1  5//2  -5//2  -4//1
328 ms

We can repeat this with the second column:

2.3 μs
U2
4×4 Array{Rational,2}:
 2//1  1//1    3//1    4//1
 0//1  7//2   -1//2   -2//1
 0//1  0//1  -15//7  -53//7
 0//1  0//1  -15//7  -18//7
46.7 μs

And the third column:

2.2 μs
U3
4×4 Array{Rational,2}:
 2//1  1//1    3//1    4//1
 0//1  7//2   -1//2   -2//1
 0//1  0//1  -15//7  -53//7
 0//1  0//1    0//1    5//1
34.7 μs

And here, we arrived at a triangular matrix. In the standard Gaussian elimination we would have manipulated the right hand side accordingly.

From here on we would start the backsubstitution which in fact is the solution of a triangular system of equations.

However, let's have a look at what we have done here: we arrived at

U1=M(A,1)AU2=M(U1,2)U1=M(U1,2)M(A,1)AU3=M(U2,3)U2=M(U2,3)M(U1,2)M(A,1)A

Thus, A=LU with L=M(A,1)1M(U1,2)1M(U2,3)1 and U=U3 L is a lower triangular matrix and U is a upper triangular matrix.

3.9 μs

A first LU decomposition

We can put this together into a function:

3.0 μs
31.6 μs
918 ms

Check for correctness:

2.4 μs
4×4 Array{Rational{Int64},2}:
 0//1  0//1  0//1  0//1
 0//1  0//1  0//1  0//1
 0//1  0//1  0//1  0//1
 0//1  0//1  0//1  0//1
103 ms

So now we can write A=LU. Solving LUx=b then amounts to solve two triangular systems:

Ly=bUx=y

2.6 μs
my_first_lu_solve (generic function with 1 method)
17.9 μs
b1
2.3 μs
x1
204 ms

Check...

2.2 μs
147 ms

... in order to be didactical, in this example, we made a very inefficient implementation by creating matrices in each step. We even cheated by using inv in order to solve a triangular system.

2.9 μs

A reasonable implementation

6.8 μs
better_lu_decomposition! (generic function with 1 method)
42.7 μs
better_lu_solve (generic function with 1 method)
40.0 μs

We can then implement a method for linear system solution:

2.3 μs
better_solve (generic function with 1 method)
14.8 μs
x2
88.9 ms
80.6 ms

Pivoting

So far, we ignored the possibility that a diagonal element becomes zero during the LU factorization procedure.

Pivoting tries to remedy the problem that during the algorithm, diagonal elements can become zero. Before undertaking the next Gauss transformation step, we can exchange rows such that we always dividy by the largest of the remaining diagonal elements. This would then in fact result in a decompositon

PA=LU

where P is a permutation matrix which can be stored in an integer vector. This approach is called "partial pivoting". Full pivoting in addition would perform column permutations. This would result in another permutation matrix Q and the decomposition

PAQ=LU

Almost all practically used LU decomposition implementations use partial pivoting.

4.5 μs

LU Factorization from Julia library

Julia implements a pivoting LU factorization

4.7 μs
lu1
LU{Rational,Array{Rational,2}}
L factor:
4×4 Array{Rational,2}:
 1//1   0//1    0//1  0//1
 5//7   1//1    0//1  0//1
 3//7   5//6    1//1  0//1
 2//7  -5//12  -1//2  1//1
U factor:
4×4 Array{Rational,2}:
 7//1   6//1   8//1    5//1
 0//1  12//7   9//7   31//7
 0//1   0//1  -5//2  -23//6
 0//1   0//1   0//1    5//2
76.3 ms

Like in matlab, the backslash opertor "solves", in this case it solves the LU factorization:

2.4 μs
114 ms

Of course we can apply \ directly to a matrix. However, behind this always LU decomposition and LU solve are called:

2.5 μs
x3
792 ms
13.9 μs

LU vs. inv

2.3 μs

In principle we could work with the inverse matrix as well:

2.3 μs
A1inv
4×4 Array{Rational{Int64},2}:
  68//75  -2//3   2//25   49//75
 -43//75   1//3  -2//25    1//75
 -46//75   1//3   6//25  -53//75
   2//5    0//1  -1//5     1//5
25.2 μs
2.6 ms

However, inversion is more complex than the LU factorization.

2.5 μs

Some performance tests.

We generate matrices

3.3 μs
rand_Ab (generic function with 1 method)
18.2 μs
62.9 μs
perftest_inv (generic function with 1 method)
55.6 μs
perftest_better (generic function with 1 method)
23.2 μs
test_and_plot (generic function with 1 method)
44.7 μs
7.5 s
  • The overall complexity in this experiment is around O(N2.75) which is in the region of some theoretical estimates.

  • A good implementation is hard to get right, straightforward code performs worse than the system implementation

  • Using inversion instead of \ is significantly slower (log scale in the plot!)

  • For standard floating point types, Julia uses highly optimized versions of LINPACK and BLAS

    • Same for python/numpy and many other coding environments

8.3 μs