9.7 s
pyplot (generic function with 1 method)
46.6 μs

Sparse matrices

5.0 μs

In the previous lectures we found examples of matrices from partial differential equations which have only 3 of 5 nonzero diagonals. For 3D computations this would be 7 diagonals. One can make use of this diagonal structure, e.g. when coding the progonka method.

Matrices from unstructured meshes for finite element or finite volume methods have a more irregular pattern, but as a rule only a few entries per row compared to the number of unknowns. In this case storing the diagonals becomes unfeasible.

3.3 μs

Definition: We call a matrix sparse if regardless of the number of unknowns N, the number of non-zero entries per row and per column remains limited by a constant ns

  • If we find a scheme which allows to store only the non-zero matrix entries, we would need not more than Nns=O(N) storage locations instead of N2

  • The same would be true for the matrix-vector multiplication if we program it in such a way that we use every nonzero element just once: matrix-vector multiplication would use O(N) instead of O(N2) operations

8.0 μs
  • What is a good storage format for sparse matrices?

  • Is there a way to implement Gaussian elimination for general sparse matrices which allows for linear system solution with O(N) operation ?

  • Is there a way to implement Gaussian elimination \emph{with pivoting} for general sparse matrices which allows for linear system solution with O(N) operations?

  • Is there any algorithm for sparse linear system solution with O(N) operations?

7.6 μs

Triplet storage format

2.4 μs
  • Store all nonzero elements along with their row and column indices

  • One real, two integer arrays, length = nnz= number of nonzero elements

(Y.Saad, Iterative Methods, p.92)

  • Also known as Coordinate (COO) format

  • This format often is used as an intermediate format for matrix construction

27.8 ms

Compressed Sparse Row (CSR) format

(aka Compressed Sparse Row (CSR) or IA-JA etc.)

  • float array AA, length nnz, containing all nonzero elements row by row

  • integer array JA, length nnz, containing the column indices of the elements of AA

  • integer array IA, length N+1, containing the start indizes of each row in the arrays IA and JA and IA[N+1]=nnz+1

A=(1.0.0.2.0.3.4.0.5.0.6.0.7.8.9.0.0.10.11.0.0.0.0.0.12.)

  • Used in many sparse matrix solver packages

26.2 μs

Compressed Sparse Column (CSC) format

  • Uses similar principle but stores the matrix column-wise.

  • Used in Julia

4.9 μs

Sparse matrices in Julia

2.4 μs
237 μs
Create sparse matrix from a full matrix
2.5 μs
A
5×5 Array{Float64,2}:
 1.0  0.0   0.0   2.0   0.0
 3.0  4.0   0.0   5.0   0.0
 6.0  0.0   7.0   8.0   9.0
 0.0  0.0  10.0  11.0   0.0
 0.0  0.0   0.0   0.0  12.0
29.1 ms
As
5×5 SparseMatrixCSC{Float64,Int64} with 12 stored entries:
  [1, 1]  =  1.0
  [2, 1]  =  3.0
  [3, 1]  =  6.0
  [2, 2]  =  4.0
  [3, 3]  =  7.0
  [4, 3]  =  10.0
  [1, 4]  =  2.0
  [2, 4]  =  5.0
  [3, 4]  =  8.0
  [4, 4]  =  11.0
  [3, 5]  =  9.0
  [5, 5]  =  12.0
74.8 ms
4.4 ms
2.4 μs
2.5 μs
316 ms
Create a random sparse matrix
4.9 μs
N
100
1.0 μs
p
0.1
1.9 μs

Random sparse matrix with probability p=0.1 that Aij is nonzero:

13.4 μs
A2
100×100 SparseMatrixCSC{Float64,Int64} with 1032 stored entries:
  [7 ,   1]  =  0.94222
  [11,   1]  =  0.0328404
  [14,   1]  =  0.288891
  [27,   1]  =  0.777506
  [29,   1]  =  0.499039
  [38,   1]  =  0.493717
  ⋮
  [3 , 100]  =  0.209442
  [7 , 100]  =  0.52545
  [25, 100]  =  0.570221
  [38, 100]  =  0.360959
  [61, 100]  =  0.174752
  [77, 100]  =  0.862321
  [86, 100]  =  0.562663
57.5 μs
36.3 ms
Create a sparse matrix from given data
  • There are several possibilities to create a sparse matrix for given data

  • As an example, we create a tridiagonal matrix.

5.2 μs
N1
10000
922 ns
a
6.5 ms
b
12.2 μs
c
16.3 μs
  • Special case: use the Julia tridiagonal matrix constructor

3.3 μs
sptri_special (generic function with 1 method)
15.1 μs
  • Create an empty Julia sparse matrix and fill it incrementally

3.1 μs
B
10×10 SparseMatrixCSC{Float64,Int64} with 0 stored entries
4.4 ms
3
32.0 ms
10×10 SparseMatrixCSC{Float64,Int64} with 1 stored entry:
  [1, 2]  =  3.0
1.7 μs
sptri_incremental (generic function with 1 method)
37.5 μs
  • Use the coordinate format as intermediate storage, and construct sparse matrix from there. This is the recommended way.

3.1 μs
sptri_coo (generic function with 1 method)
40.2 μs
  • Use the ExtendableSparse.jl package which implicitely uses the so-called linked list format for intermediate storage of new entries. Note the flush!() method which needs to be called in order to transfer them to the Julia sparse matrix structure.

17.4 μs
44.5 ms
sptri_ext (generic function with 1 method)
30.6 μs
BenchmarkTools.Trial: 
  memory estimate:  547.27 KiB
  allocs estimate:  8
  --------------
  minimum time:     38.350 μs (0.00% GC)
  median time:      42.076 μs (0.00% GC)
  mean time:        58.949 μs (19.93% GC)
  maximum time:     1.513 ms (94.22% GC)
  --------------
  samples:          10000
  evals/sample:     1
6.4 s
BenchmarkTools.Trial: 
  memory estimate:  1.08 MiB
  allocs estimate:  33
  --------------
  minimum time:     18.266 ms (0.00% GC)
  median time:      18.774 ms (0.00% GC)
  mean time:        18.830 ms (0.11% GC)
  maximum time:     20.520 ms (0.00% GC)
  --------------
  samples:          266
  evals/sample:     1
11.1 s
BenchmarkTools.Trial: 
  memory estimate:  2.65 MiB
  allocs estimate:  66
  --------------
  minimum time:     621.986 μs (0.00% GC)
  median time:      647.085 μs (0.00% GC)
  mean time:        727.777 μs (7.74% GC)
  maximum time:     2.324 ms (60.01% GC)
  --------------
  samples:          6861
  evals/sample:     1
10.9 s
BenchmarkTools.Trial: 
  memory estimate:  1.53 MiB
  allocs estimate:  25
  --------------
  minimum time:     681.731 μs (0.00% GC)
  median time:      740.557 μs (0.00% GC)
  mean time:        784.821 μs (4.09% GC)
  maximum time:     2.394 ms (63.66% GC)
  --------------
  samples:          6368
  evals/sample:     1
11.0 s

Benchmark summary:

  • The incremental creation of a SparseMartrixCSC from an initial state with non nonzero entries is slow because of the data shifts and reallocations necessary during the construction

  • The COO intermediate format is sufficiently fast, but inconvenient

  • The ExtendableSparse package provides has similar peformance and is easy to use.

4.9 μs

Sparse direct solvers

  • Sparse direct solvers implement LU factorization with different pivoting strategies. Some examples:

    • UMFPACK: e.g. used in Julia

    • Pardiso (omp + MPI parallel)

    • SuperLU (omp parallel)

    • MUMPS (MPI parallel)

    • Pastix

  • Quite efficient for 1D/2D problems - we will discuss this more deeply

  • Essentially they implement the LU factorization algorithm

  • They suffer from fill-in, especially for 3D problems:

Let A=LU be an LU-Factorization. Then, as a rule, nnz(L+U)>>nnz(A).

  • increased memory usage to store L,U

  • high operation count

11.9 μs
30.4 ms
31.8 ms
31.2 ms
488 μs

Solution steps with sparse direct solvers

  1. Pre-ordering

  • Decrease amount of non-zero elements generated by fill-in by re-ordering of the matrix

  • Several, graph theory based heuristic algorithms exist

  1. Symbolic factorization

  • If pivoting is ignored, the indices of the non-zero elements are calculated and stored

  • Most expensive step wrt. computation time

  1. Numerical factorization

  • Calculation of the numerical values of the nonzero entries

  • Moderately expensive, once the symbolic factors are available

  1. Upper/lower triangular system solution

  • Fairly quick in comparison to the other steps

12.0 μs
  • Separation of steps 2 and 3 allows to save computational costs for problems where the sparsity structure remains unchanged, e.g. time dependent problems on fixed computational grids

  • With pivoting, steps 2 and 3 have to be performed together, and pivoting can increase fill-in

  • Instead of pivoting, iterative refinement may be used in order to maintain accuracy of the solution

5.7 μs

Influence of reordering

  • Sparsity patterns for original matrix with three different orderings of unknowns

    • number of nonzero elements (of course) independent of ordering:

    (mathworks.com)

  • Sparsity patterns for corresponding LU factorizations

    • number of nonzero elements depend original ordering!

    (mathworks.com)

25.2 μs

Sparse direct solvers: Complexity estimate

  • Complexity estimates depend on storage scheme, reordering etc.

  • Sparse matrix - vector multiplication has complexity O(N)

  • Some estimates can be given from graph theory for discretizations of heat equation with N=nd unknowns on close to cubic grids in space dimension d

  • sparse LU factorization:

dworkstorage1O(N)|O(n)O(N)|O(n)2O(N32)|O(n3)O(NlogN)|O(n2logn)3O(N2)|O(n6)O(N43)|O(n4)

  • triangular solve: work dominated by storage complexity

dwork1O(N)|O(n)2O(NlogN)|O(n2logn)3O(N43)|O(n4)

(Source: J. Poulson, PhD thesis)

8.5 μs

Practical use

  • \ operator

4.1 μs
19.5 ms
24.2 ms
Asparse_ext
10000×10000 ExtendableSparseMatrix{Float64,Int64}:
 0.333554  0.604986  0.0       0.0        …  0.0       0.0       0.0       0.0
 0.178295  0.378649  0.210584  0.0           0.0       0.0       0.0       0.0
 0.0       0.737103  0.622097  0.889549      0.0       0.0       0.0       0.0
 0.0       0.0       0.370098  0.0677654     0.0       0.0       0.0       0.0
 0.0       0.0       0.0       0.837115      0.0       0.0       0.0       0.0
 0.0       0.0       0.0       0.0        …  0.0       0.0       0.0       0.0
 0.0       0.0       0.0       0.0           0.0       0.0       0.0       0.0
 ⋮                                        ⋱                                
 0.0       0.0       0.0       0.0           0.0       0.0       0.0       0.0
 0.0       0.0       0.0       0.0        …  0.450361  0.0       0.0       0.0
 0.0       0.0       0.0       0.0           0.815526  0.663642  0.0       0.0
 0.0       0.0       0.0       0.0           0.909567  0.65137   0.944804  0.0
 0.0       0.0       0.0       0.0           0.0       0.901804  0.894436  0.0753511
 0.0       0.0       0.0       0.0           0.0       0.0       0.267438  0.647475
799 μs
13.6 ms