How to Speed Up HGS without Parallel Computing A Parallel Computational Framework

HydroGeoSphere User Conference 2011
Leibniz Universitat Hannover, April 11-13, 2011
How to Speed Up HGS
without Parallel Computing
A Parallel Computational Framework
to Simulate Flow and Transport
in Integrated Hydrologic Systems
GUI: Visual HydroGeoSphere
How to Speed Up HGS
without Parallel Computing
2
3
Control Volume Finite Element Method
For Solving Groundwater and Surface Water
Flow
∆Vw ∆hi
∆Si
Q ji =
≅
Ss + n
∑
∆t
∆t
∆t
j = j1
j= j4
=
Q ji C ji (h j − hi )
j4
Q j 4i
Aij = Cij (hi , h j )
Q j 3i
Q j1i
i
j1
Q j 2i
j2
j= j4
j3
Ss
− ∑ Cij −
Aii =
∆t
j = j1
A(h) ⋅ h =
b(h)
4
Newton-Raphson Method
For Solving Nonlinear Equations
Root finding: find hs that satisfies the following:
A(h s ) ⋅ h s =
b(h s )
or
R (h s=
) b(h s ) − A(h s ) ⋅ h=
0
s
R
∂R
R(h + ∆=
h) R ( h) + =
∆h 0
∂h
−1
 ∂R 
∆h =− 
 R ( h)
 ∂h 
R (h0 )
R (h1 )
hk +1= hk + ∆hk +1
if ∆hk +1 < ε abs , then hk +1  hs
R (h2 )
R (h3 )
or
h3∆h3 h2
hs
∆h2
h1
∆h1
h0
h if R(h ) < ε , then h  h
k +1
rel
k +1
s
5
Newton-Raphson Method
For Solving Nonlinear Equations
 ∂R  R(h + ε J ) − R (h)
=
J =

εJ
 ∂h 
if ∆hk +1 < ε abs , then hk +1  hs
or
if R(hk +1 ) < ε rel , then hk +1  hs
∆hallowed
∆=
t new
∆t old
max(∆h)
∆t new
α ≤ old ≤ β
∆t
6
Newton-Raphson Method
For Solving Nonlinear Equations
−1
 ∂R 
−1
∆h =− 
=
−
R
(
h
)
J
R ( h)

 ∂h 
hk +1= hk + ∆hk +1
hk +1 = hk + κ∆hk +1 where κ ≤ 1
κ= 1 if (∆hk ∆hk +1 ) > 0
R
R (h0 )
R (h2 )
∆h3 < 0
h1
∆h1 < 0
R (h3 ) h3
R (h1 )
∆h2 > 0 h
2
hs
h0
7
Newton-Raphson Method
For Solving Nonlinear Equations
−1
 ∂R 
−1
∆h =− 
=
−
R
(
h
)
J
R ( h)

 ∂h 
hk +1= hk + ∆hk +1
hk +1 = hk + κ∆hk +1 where κ ≤ 1
κ= 1 if (∆hk ∆hk +1 ) > 0
κ=
(∆hk +1 ) allowed
or 1
∆hk +1
Control Volume Finite Element Method
For Solving Groundwater and Surface Water
Flow
∆Vw ∆hi
∆Si
Q ji =
≅
Ss + n
∑
∆t
∆t
∆t
j = j1
j= j4
=
Q ji C ji (h j − hi )
j4
Q j 4i
Aij = Cij (hi , h j )
Q j 3i
Q j1i
i
j1
Q j 2i
j2
j= j4
j3
Ss
− ∑ Cij −
Aii =
∆t
j = j1
A(h) ⋅ h =
b(h)
9
Control Volume Finite Element Method
For Solving Groundwater and Surface Water
Flow
=
Q C (h − h )
ji
ji
j
i
if C ji  1, then hi ≈ h j

d o5/ 3
C ji is derived from Qo =
−  1/ 2 ∇(d o + zo )
nΦ
3
d o min[d o2 / 3 , d o2,/allowed
]
d o5/ 3
 1/ 2 ≈ 
2
nΦ
n max[Φ1/ 2 , Φ1/allowed
]
10
Root finding: find hs that satisfies the following:
R (h s=
) b(h s ) − A(h s ) ⋅ h=
0
s
if ∆hk +1 < ε abs , then hk +1  hs
Guarantees solutions
to be close enough
Not guarantee
achievement of objective
Can be problematic
for highly sensitive cases
if R(hk +1 ) < ε rel , then hk +1  hs
Not guarantee
close enough solutions
Guarantees
achievement of objective
Can be problematic
for insensitive cases
11
no nodal flow check
remove negative coefficient
finite difference mode
etc
Aij = Cij (hi , h j )
j= j4
Ss
− ∑ Cij −
Aii =
∆t
j = j1
A(h) ⋅ h =
b(h)
12
Interactions at the Interface
Between Surface and Subsurface Flow Regimes
Rill Storage and Storage Exclusion
Manning’s Equation for Surface Flow
∂ho
vox = − Kox
∂x
∂ho
voy = − Koy
∂y
d o2 / 3
1
Kox =
n x [ ∂ho ∂s ]1/ 2
d o2 / 3
1
Koy =
n y [ ∂ho ∂s ]1/ 2
0.3
Kox ⋅ n x [∂ho ∂s ]
1/ 2
0.2
d o2 / 3
0.1
0
0
0.02
0.04
0.06
do
0.08
0.1
Modified Manning’s Equation:
Rill Storage and Storage Exclusion
vox = −k H Kox
∂ho
∂x
1
∂ho
voy = −k H Koy
∂y
kH
0

k H  S 2(1− S )
=
1

S 2(1− S )
do < H d
H d < do < H s
do > H s
0
S=
(d o − H d ) ( H s − H d )
Hd
Ho
Hs
do
Relative Permeability for Fluid Exchange
qexch =
−kV ,up K exch ( ho − h pm ) / lexch
1
K exch = K zz , pm
kV ,up
k r ,o
k r , pm
=
k r ,o
if h pm > ho
if ho > h pm
 Sexch 2(1− Sexch )
=
1
Sexch = d o / H s
Sexch 2(1− Sexch )
k r ,o
do < H s
do > H s
0
Hd
Ho
Hs
do
Volume Depth and Surface Porosity
Hs
do
2
H
−
H
d
−
−
H
d
H
d
− s

o
o
o
1 −  s
 s asin  s

Hs 
H s 
 2
2


dV ( d o ) = 
π

(
d
H
)
Hs
−
+
s
 o
4
dV ( d o ) ∈ C 1
φo = dV / d o
do ≤ H s
do > H s
3
1
0.8
π
4
2
φo
0.6
( dV / d o )
dV / H s
0.4
1
0.2
0
0
0
1
2
do / H s
3
0
1
2
3
do / H s
4
5
HydroGeoSphere: A three-dimensional numerical model
describing fully-integrated subsurface and surface flow
and transport
Attempt to account for all interactions
between surface and subsurface flow
regimes
Conceuptually Superior to linked
simulators or iteratively coupled simulators
Complex (more processes, highly nonlinear)
Overview of HydroGeoSphere Features
 2D overland/stream flow (Diffusion-wave equation),
including stream/surface drainage network genesis.
 3D variably-saturated flow (Richards’ equation + ET)
in porous medium.
 3D variably-saturated flow in macropores, fractures and
karst conduits (dual-porosity, dual-permeability)
 Advective-dispersive, reactive solute/thermal transport
in all continua.
 Allows for complex topography, irregular surface & subsurface pr
operties, density-dependent flow, pumping wells, tiles, etc.
 Fully-coupled, simultaneous solution of surface/subsurface
flow and transport via Control-Volume Finite Element
Method.
Overview of HydroGeoSphere
Computing flow for serial HGS
Domain p
artitioning
23
Computing time for serial HGS
Distribution of computing time
Matrix assembly
20.7 %
ILUC
1.5 %
Matrix solver
74.5 %
• The targets of parallelization are matrix assembly and iterativ
e
24
matrix solver
Parallel schemes applied to HGS:
Global matrix assembly
Coarse grain parallelism
Assembling Matrix is highly independent, so coarse grai
n parallelism is applied, leading to high parallel efficiency.
Thread 1
Thread 2
Thread 3
Thread 4
25
Parallel schemes applied to HGS:
Matrix solver
Algorithm of BiCGSTAB
26
Parallel schemes applied to HGS:
Matrix solver
Floating-point operation
(Chin and Forsyth, 1993)
(Chin
and Forsyth,
MV
DP199
SV
8)
LU
Floating-point
operation
2nzp
2nz
4n
6n
3D, level 0-ILU
12n
12n
4n
6n
> 75
%
 Computing time using Scalasca
Computing tim
e per iteration (
%)
LU
MV
DP
SV
55.8
37.4
2.4
0.02
> 93 %
 LU and MV are hot spots for BiCGSTAB
• MV is independent computation, so is easy to parallelize
• LU needs information on neighboring nodes, so a new parall
el
method is necessary
27
Parallelization of LU solve:
Domain partitioning and node reordering
- Domain partitioning scheme is applied
to make independent sub-matrix
- Node reordering is performed to narrow
the matrix bandwidth.
Schematic of a) domain partitioning method and b) Numbering scheme
28
Parallelization of LU solve:
Data structure for domain decomposition
The Sparse Matrix Storage Scheme (Yale format)
ia : the array of row pointers into ja.
ja : the array of column indices
a : the array of all the matrix nonzero scalar elements
Example code of Matrix-Vector multiplication
DO 20 jind = ia(ibf) + 1,ia(ibf+1) - 1
x(ibf) = x(ibf) - a( jind)*x( ja( ji
nd))
20
CONTINUE
DO 20 jind = iaf(ibf) + 1,iaf(ibf+1) - 1
x(ibf) = x(ibf) - af( jind)*x( jaf( j
ind))
20
CONTINUE
29
(After Clift et al., 1996)
Parallelization of LU solve
Results of domain partitioning and node reordering
Original matri
x
reordered matri
x (4 Threads)
30
Parallelization of LU solve:
Matrix and array chopping for privatization
• Original parallel scheme for LU solve
31
Parallelization of LU solve:
Matrix and array chopping for privatization
Matrix chopping
(After Ruud van der Pas, 20
09)
32
Parallelization of LU solve:
Matrix and array chopping for privatization
Array chopping
33
Scalability tests of Parallel HydroGeoSphere
Saturated flow
no Nx
ny
nz
Simulation
type
Variably-saturated flow
Heterogeneity (m/day)
Mean ln(K) Var ln(K)
Test
machine
1 100 33 10
S1) , VS2)
-4.6
10.6
GPC3)
2 330 330 10
S, VS
-4.6
10.6
GPC
3 330 330 10
VS
-4.6
10.6
TCS4)
S1) : Saturated flow; VS2) : Variably-saturated flow
GPC3) : General Purpose Cluster (GPC) at Scinet/University of Toronto
TCS4) : Tightly Coupled System (TCS) at Scinet/University of Toronto
 Parallel HydroGeoSphere (PHGS) is compiled by Intel® FORTRAN compiler
10.1 for GPC (-O0 and –O3) and IBM xlf compiler for TCS (-O5)
34
Scalability tests of Parallel HydroGeoSphere
Scalability of conceptual models (simulation #1, GPC, 105 node
s)
Privatization
No Privatization
6.3
Saturated
flow
(no opt.)
4.5
Saturated
flow
(opt.)
35
Scalability tests of Parallel HydroGeoSphere
Scalability of conceptual models (simulation #1, GPC, 105 node
s)
Privatization
No Privatization
6.9
Variablysaturated
flow
(no opt.)
5.1
Variablysaturated
flow
(opt.)
36
Scalability tests of Parallel HydroGeoSphere
Scalability of conceptual models (simulation #2, GPC, 106 nodes)
Privatization
No Privatization
6.8
Saturated
flow
(no opt.)
5.1
Saturated
flow
(opt.)
37
Scalability tests of Parallel HydroGeoSphere
Scalability of conceptual models (simulation #2, GPC, 106 node
s)
Privatization
No Privatization
6.8
Variablysaturated
flow
(no opt.)
5.9
Variablysaturated
flow
(opt.)
38
Scalability tests of Parallel HydroGeoSphere (optimization)
Scalability of conceptual models (simulation #3, TCS, 106 nodes)
Privatization
No Privatization
12.2
Variablysaturate
d flow
(opt.)
8.8
39
Conclusions
For surface and subsurface flow simulations, the computational
hot spots are matrix assembly and the iterative solver.
To obtain high parallel efficiency, a coarse grain parallelism and
privatization scheme is applied to matrix assembly and
iterative matrix solver (BiCGSTAB), respectively.
Privatization scheme increases parallel efficiency.
40
GUI: Visual HydroGeoSphere
Flow Chart for Visual HGS
42
2D Grid Generation
Grid generation in 2D H-plane
Grid generation in 2D V-plane
43
Assigning Material Properties
44
Simulation Parameters
Subsurface flow
Medium Type
Material Properties
Initial Condition
Boundary Condition
Surface flow
porous medium
surface runoff
hydraulic conductivity, specific
storage, porosity, etc
friction factor, Rill storage height,
Bottom leakance, etc
initial head
initial water depth
specified heads, specified flux
critical depth boundary
45
Post-Processing
46
Triangulation
CGAL(Computation Geometry Algorithm Library) Delaunay Refinement Package
- PLSG (Planar straight line graph): segment, isolated point
- seed point
- Triangle Constraints: size and angle for triangles
Angular contraints
Size contraints
Internal boundaries
47
Geo-Modelling
48
Geo-Modelling using Borehole Information
49
Main Interface
menu
toolbar
object browser
3D view
property view
50
Geometry Selection
by layer
node
segment
highlighting
face
element
51
Simulation parameters
Material properties
52
Examples
53