Parallel 3D ADI Scheme for Partially Dimension Reduced Heat Conduction Problem

Čiegis, Raimondas; Čiegis, Remigijus; Suboč, Olga

doi:10.15388/22-INFOR481

Parallel 3D ADI Scheme for Partially Dimension Reduced Heat Conduction Problem

Article type: Research Article

Authors: Čiegis, Raimondas¹ | Čiegis, Remigijus² | Suboč, Olga^{1; ∗}

Affiliations: [1] Vilnius Gediminas Technical University, Saulėtekio av. 11, LT-10223 Vilnius, Lithuania | [2] Vilnius University Kaunas Faculty, Muitinės st. 8, LT-44280 Kaunas, Lithuania

Correspondence: [∗] Corresponding author.

Keywords: hybrid dimension model, ADI scheme, parallel factorization algorithm, heat conduction, piplines

DOI: 10.15388/22-INFOR481

Journal: Informatica, vol. 33, no. 3, pp. 477-497, 2022

Received November 2021

Accepted April 2022

Published: 2022

Get PDF

Abstract

This model describes the heat equation in 3D domains, and this problem is reduced to a hybrid dimension problem, keeping the initial dimension only in some parts and reducing it to one-dimensional equation within the domains in some distance from the base regions. Such mathematical models are typical in industrial installations such as pipelines. Our aim is to add two additional improvements into this methodology. First, the economical ADI type finite volume scheme is constructed to solve the non-classical heat conduction problem. Special interface conditions are defined between 3D and 1D parts. It is proved that the ADI scheme is unconditionally stable. Second, the parallel factorization algorithm is proposed to solve the obtained systems of discrete equations. Due to both modifications the run-time of computations is reduced essentially. Results of computational experiments confirm the theoretical error analysis and scalability estimates of the parallel algorithm.

1Introduction

An important trend in development of mathematical models, simulating various real world applications, deals with new type nonlocal boundary and conjugation conditions. The non-locality mechanism helps to simulate important physical processes more accurately. Still the existence and uniqueness of the solution, its stability is very sensitive to approximation of such nonlocal boundary conditions and require development and analysis of special discrete approximation techniques and modifications of implementation algorithms. As one important example we mention the simulation of enzyme-catalysed glucose oxidation and redox reactions over inhomogeneous surfaces (Skakauskas et al., 2018; Čiegis et al., 2018).

It is a standard situation in applications of the mathematical modelling technique that systems of nonstationary 3D differential equations should be solved. Even if state of the art high-order discrete approximations are constructed on adaptive meshes, the total complexity of the obtained computational algorithms is very large. It restricts our possibility to include such models into general optimization routines or to use them in real time decision making applications.

Recently, a new important technique has been proposed to solve such applied problems more efficiently. A model reduction technique is applied to avoid the effects of curse of dimensionality. The given 3D heat conduction problem is reduced to a hybrid dimension problem, keeping the initial dimension only in some parts of the domain and reducing it to one-dimensional equation within the domains in some distance from the base regions. Such mathematical models are typical in industrial installations such as pipelines.

Our aim is to add two additional improvements into this methodology. First, the economical ADI type finite volume scheme is constructed to solve the non-classical heat conduction problem. It is proved that the ADI scheme is unconditionally stable. Second, the parallel factorization algorithm is proposed to solve the obtained systems of discrete equations. Due to both modifications the run-time of computations is reduced essentially.

The method of partial dimension reduction was first introduced in Panasenko (1998) and then developed in Panasenko (2005). It is based on the asymptotic analysis of a solution of a partial differential equation set in a thin domain and on a projection of the variational model formulation on a subspace of functions having a form of the asymptotic expansion in all zones of regular behaviour of the solution. Then the dimension of the problem is reduced within the big part of the domain keeping the full dimension description only in small zones of a singular behaviour of the solution.

Applying this approach a model with nonlocal junctions of one dimensional and full dimensional equations is obtained. Note that nonlocal conjugation conditions make the main challenges in development of effective parallel numerical methods for this class of problems of hybrid dimension. Let us mention some papers where different techniques were used to discretize the obtained hybrid mathematical model. A finite volume setting was studied in Panasenko and Viallon (2015). The method of asymptotic partial decomposition of the domain was compared with several existing methods of dimension reduction in Amar and Givoli (2018) and its effectiveness with respect to the precision and time of the execution is shown. In a recent paper (Čiegis et al., 2020) ADI type finite volume scheme was considered for a 2D heat conduction problem.

However, the question about the most effective parallel numerical solvers for such problems of hybrid dimension is still an important topic. Non-classical conjugation conditions require to develop special solvers to get the discrete solutions. A straightforward way is to solve the obtained systems of linear equations by using general iterative solvers, e.g. well-known multigrid solvers. For multidimensional parabolic problems an alternative and most popular way is to use splitting methods (Hundsdorfer and Verwer, 2003; Samarskii, 2001). It is well-known that in many cases ADI type methods are the best selection.

Thus in this paper we have developed and analysed special efficient ADI schemes to solve the hybrid 3D partial dimension reduction models for a nonstationary heat conduction problem. Our second goal is to propose efficient parallel implementations of the constructed ADI schemes. All main topics required in the analysis of discrete schemes, i.e. the approximation, positivity and stability of the solution are investigated. The accuracy of the reduced dimension model and the scalability of the parallel algorithms are tested in a series of computational experiments.

The approximation error is investigated quite straightforwardly, since the regularity of the solution and dependence of it on the reduction parameter δ is known from results given in Amosov and Panasenko (2018), Panasenko (2005). A more complicated step is to investigate the stability of proposed parallel ADI schemes in the case of new non-local conjugation conditions.

The rest of the paper is organized in the following way. In Section 2 the problem is formulated. The nonstationary heat conduction equation is formulated in the 3D parallelepiped and the initial and boundary conditions are specified. In this section also, the partially reduced dimension model is constructed and an approximate solution is defined by considering a simplified domain when 3D small size subregions are coupled with a 1D line. The non-local conjugation conditions are defined at the truncations of the tube. In Section 3 the full problem in the 3D domain is solved by using the ADI stability correction scheme. It is proved that this scheme is unconditionally stable. Its implementation requires solving many linear systems with tridiagonal matrix and the classical factorization algorithm is used. The Wang parallel factorization algorithm also can be used for parallel computers. In Section 4 the modified ADI stability correction scheme is constructed for the reduced dimension heat conduction problem. It is proved that the unique solution of the obtained system of linear equations exists for each time level. The efficient factorization algorithm is defined, it is based on algorithms presented in Čiegis et al. (2020). A parallel version of the modified factorization algorithm for implementation of the ADI scheme approximating the reduced dimension heat conduction problem is constructed in Section 5. A theoretical scalability analysis of this parallel algorithm is done. In Section 6 results of computational experiments are presented and analysed. Some final conclusions are done in Section 7.

2Problem Formulation

Let us consider a parallelepiped Ω⊂R3, which has the following form Ω=(0,X1)×D, where D is a rectangle D={(x2,x3):0<xj<Xj,j=2,3}.

We are interested to solve a linear heat equation in Ω×(0,T]:

(1)

∂u∂t=∑j=13∂2u∂xj2+f(x1,x2,x3,t),(x1,x2,x3,t)∈QT=Ω×(0,T],

(2)

u(0,x2,x3,t)=g1(x2,x3,t),u(X1,x2,x3,t)=g2(x2,x3,t),(x2,x3,t)∈D¯×(0,T],

(3)

∂u∂η=0,(x1,x2,x3,t)∈∂Ω1×(0,T],

(4)

u(x1,x2,x3,0)=u0(x1,x2,x3),(x1,x2,x3)∈Ω,

where ∂Ω1=∂Ω∖(D×{x1=0}∪{x1=X1}) denotes a part of the boundary, where the Neumann boundary condition is specified.

Next, following Amosov and Panasenko (2018), we consider a modified approximate problem. First, let S(u)

S(u)=1|D|∫0X2∫0X3u(x1,x2,x3,t)dx2dx3

denote the averaging operator. We assume that the initial condition u0 and source function f satisfy the relations

u0(x1,x2,x3)=S(u0),f(x1,x2,x3,t)=S(f),(x1,t)∈(0,X1)×(0,T].

It means that u0 and f don’t depend on x2, x3 within Ω.

Denote a reduced dimension domain Ωδ={(x1,x2,x3)∈(δ,X1−δ)×D}. Function U is called an approximate solution to problem (1)–(4) if it satisfies the following problem (Amosov and Panasenko, 2018)

(5)

∂U∂t=∑j=13∂2U∂xj2+f(x1,t),(x1,x2,x3,t)∈(Ω∖Ωδ)×(0,T],

(6)

∂U∂t=∂2U∂x12+f(x1,t),(x1,x2,x3,t)∈Ωδ×(0,T],

(7)

U(0,x2,x3,t)=g1(x2,x3,t),U(X1,x2,x3,t)=g2(x2,x3,t),(x2,x3,t)∈D¯×(0,T],

(8)

∂U∂η=0,(x1,x2,x3,t)∈∂Ωδ,1×(0,T],

(9)

U(x1,x2,x3,0)=u0(x1,x2,x3),(x1,x2,x3)∈Ω,

where ∂Ωδ,1=∂Ωδ∖(D×{x1=δ}∪{x1=X1−δ}).

In Ωδ×(0,T] the solution U doesn’t depend on x2, x3, i.e.

U(x1,x2,x3,t)=S(U),(x1,x2,x3,t)∈Ωδ×(0,T].

Thus in the reduced domain it is sufficient to find 1D space function U(x1,t).

By analysing the weak form of the heat equation, it is shown in Amosov and Panasenko (2018) that the following conjugation conditions are valid at the truncation points of the space domain

(10)

U|x1=δ−0=U|x1=δ+0,U|x1=X1−δ−0=U|x1=X1−δ+0,

(11)

∂S(U)∂x1|x1=δ−0=∂U∂x1|x1=δ+0,∂U∂x1|x1=X1−δ−0=∂S(U)∂x1|x1=X1−δ+0.

The conditions (10) are classical and mean that U is continuous at the truncation points. While the remaining two conditions (11) are nonlocal and they define the conservation of full fluxes along the separation planes.

3ADI Scheme for the 3D Differential Problem

The uniform spatial mesh Ω¯h=ω¯1×ω¯2×ω¯3 is defined as

ω¯k={xkj:xkj=jhk,j=0,…,Jk},xkJk=Xk,k=1,2,3.

For simplicity of notations we also consider a uniform time mesh:

ω¯t={tn:tn=nτ,n=0,…,N},tN=T.

Let Uijkn be a numerical approximation to the exact solution u(x1i,x2j,x3k,tn) of problem (1)–(4) at the grid point (x1i,x2j,x3k,tn).

The following standard operators are defined for discrete functions:

∂x1Uijkn:=Uijkn−Ui−1,j,knh1,∂x2Uijkn:=Uijkn−Ui,j−1,knh2,∂x3Uijkn:=Uijkn−Ui,j,k−1nh3,A1hUijkn:=−1h1(∂x1Ui+1,j,kn−∂x1Uijkn),0<i<J1A2hUijkn:=−2h2∂x2Ui1kn,j=0,−1h2(∂x2Ui,j+1,kn−∂x2Uijkn),0<j<J2,2h2∂x2UiJ2kn,j=J2,A3hUijkn:=−2h3∂x3Uij1n,k=0,−1h3(∂x3Ui,j,k+1n−∂x3Uijkn),0<k<J3,2h3∂x3UijJ3n,k=J3.

The discrete approximation of diffusion operators is obtained by using the finite volume method.

Then the heat conduction problem (1)–(4) is approximated by the following alternating direction implicit (ADI) stability correction scheme (Hundsdorfer and Verwer, 2003)

(12)

Uijkn+14−Uijknτ+∑l=13AlhUijkn=fijkn+12,(x1i,x2j,x3k)∈ω1×ω¯2×ω¯3,Uijkn+l4−Uijkn+l−14τ+12A5−lh(Uijkn+l4−Uijkn)=0,l=2,3,4.

At the first step the heat conduction problem (1)–(4) is approximated by the explicit Euler scheme. It is well known that this scheme is only conditionally stable. The next three steps of scheme (12) guarantee the unconditional stability of the discrete scheme and they increase the accuracy of approximation in time until the second order.

At all three steps 3D systems are split into a set of 1D problems along x3, x2 and x1 coordinates respectively, and each subproblem is solved by using the classical fast factorization algorithm.

Lemma 1.

If a solution of the problem (1)–(4) is sufficiently smooth, then the approximation error of ADI scheme (12) is O(τ2+h12+h22+h32).

Proof.

For a self-completeness of the text and in order to explain the approximation property of the ADI scheme (12) we present a short proof of this result (for a detailed analysis see Hundsdorfer and Verwer, 2003). By adding all equations (12) the following discrete equation is obtained

(13)

Uijkn+1−Uijknτ+∑l=2412A5−lh(Uijkn+l4+Uijkn)=fijkn+12.

In order to compute the approximation error of (12), we use the relations

Uijkn∼u(x1i,x2j,x3k,tn),Uijkn+l4∼u(x1i,x2j,x3k,tn+1),l=2,3,4.

Thus the approximation error of ADI scheme (12) with respect to time coordinate is the same as of the classical Crank-Nicolson method. The second order accuracy of the constructed approximations of space derivatives is shown, e.g. in Hundsdorfer and Verwer (2003), Samarskii (2001). □

It is a straightforward task to show the result presented in the following lemma.

Lemma 2.

The discrete operators A1h, A2h and A3h are symmetric, A1h is positive definite and A2h, A3h are non-negative operators.

All operators A1h, A2h and A3h commute, thus there exists a complete system of functions

{ϕ1i(x1)ϕ2j(x2)ϕ3k(x3),1⩽i⩽J1−1,0⩽j⩽J2,0⩽k⩽J3},

where ϕlm(xl) are eigenvectors of the operator Alh, l=1,2,3:

Alhϕlm=λlmϕlm,l=1,2,3

and λlm are eigenvalues

(14)

λ1i⩾λ11>0,λlj⩾0,l=2,3.

Lemma 3.

ADI scheme (12) is unconditionally stable.

Proof.

For the given normal operators Alh, l=1,2,3 the Fourier stability analysis can be used. Let us consider the solution of ADI scheme (12) in the case when boundary conditions gj=0, j=1,2 and f(x,t)=0. The solution of (12) can be written as

Uijkn=∑l=1J1−1∑m=0J2∑r=0J3clmrnϕ1l(x1i)ϕ2m(x2j)ϕ3r(x3k).

Substituting this formula into equations (12), we obtain for each mode the stability equations

clmrn+1=qlmrclmrn,

where the discrete stability function qlmr is defined as

qlmr=(1−τ2(λ1l+λ2m+λ3r)+τ24(λ1lλ2m+λ1lλ3r+λ2mλ3r)+τ38λ1lλ2mλ3r)(1+0.5τλ1l)(1+0.5τλ2m)(1+0.5τλ3r).

Since eigenvalues satisfy estimates (14), then |qlmr|<1 and the ADI scheme (12) is unconditionally stable in the L2 norm. □

It is easy to see that the discrete stability function approximates the continuous stability function of the differential equation (1) with the second order accuracy.

4The ADI Scheme for the Partially Dimension Reduced Problem

Let K1 and K2 define the truncation points of the domain x1K1=δ, x1K2=X1−δ. Then the spatial mesh ω1 is splitted into three parts:

ω11={x1i:x1i=ih1,i=1,…,K1−1},ω¯12={x1i:x1i=ih1,i=K1,…,K2},ω13={x1i:x1i=ih1,i=K2+1,…,J1−1}.

We note that more general meshes are used in Viallon (2013); Panasenko and Viallon (2015), where some atypical cells along the interface are constructed, and that lead to non-admissible meshes. In this paper we use the finite volume approach. For more general non-uniform meshes it can be recommended to use a general framework of finite element and finite difference schemes for construction of discrete approximations on non-matching grids (Eymard et al., 2000; Faille, 1992). A numerical comparison of these methods for 1D-2D discrete schemes is done in Viallon (2013).

The heat conduction problem (5)–(11) is approximated by the modified version of the ADI scheme

(15)

Uijkn+14−Uijknτ+∑l=13AlhUijkn=fijkn+12,(x1i,x2j,x3k)∈(ω11∪ω13)×ω¯2×ω¯3,Ui00n+14−Ui00nτ+A1hUi00n=fi00n+12,x1i∈ω¯12∖{x1K1,x1K2},

(16)

UK100n+14−UK100nτ+1h12(−Sh(UK1−1n)+2UK100n−UK1+1,0,0n)=fK100n+12,UK200n+14−UK200nτ+1h12(−Sh(UK2+1n)+2UK200n−UK2−1,0,0n)=fK200n+12,Uijkn+l4−Uijkn+l−14τ+12A5−lh(Uijkn+l4−Uijkn)=0,l=2,3,

(17)

(x1i,x2j,x3k)∈(ω11∪ω13)×ω¯2×ω¯3,Uijkn+1−Uijkn+34τ+12A1h(Uijkn+1−Uijkn)=0,

(18)

(x1i,x2j,x3k)∈(ω11∪ω13)×ω¯2×ω¯3,Ui00n+1−Ui00n+34τ+12A1h(Ui00n+1−Ui00n)=0,x1i∈ω¯12∖{x1K1,x1K2},

(19)

UK100n+1−UK100n+34τ+12h12(−Sh(UK1−1n+1)+2UK100n+1−UK1+1,0,0n+1+Sh(UK1−1n)−2UK100n+UK1+1,0,0n)=0,

(20)

UK200n+1−UK200n+34τ+12h12(−Sh(UK2+1n+1)+2UK200n+1−UK2−1,0,0n+1+Sh(UK2+1n)−2UK200n+UK2−1,0,0n)=0.

Here Sh denotes the discrete averaging operator:

Sh(Uin)=1|D|∑j=0J2∑k=0J3d2jd3kUijknh2h3,

where dlm=1, if 0<m<Jl and dlm=0.5, if m=0,Jl, l=2,3. Notation of the argument Uin indicates that the two dimensional average of discrete function Un is computed at the grid point x1i.

Note that equations (19), (20) approximate the nonlocal flux conjugation conditions (11).

We define two discrete meshes Ωh,RD=(ω¯2×ω¯3×(ω11∪ω13))∪ω¯12 and Ω¯h,RD=Ωh,RD∪(ω¯2×ω¯3×(x10∪x1J1)). Let us consider discrete functions Uijk=U(x1i,x2j,x3k) defined on the spatial mesh Ω¯h,RD. We denote the set of such vectors by Dh.

Let us define three operators for U∈Dh:

A2hU=A2hU,(x1i,x2j,x3i)∈(ω11∪ω13)×ω¯2×ω¯3,0,x1i∈ω¯12,A3hU=A3hU,(x1i,x2j,x3i)∈(ω11∪ω13)×ω¯2×ω¯3,0,x1i∈ω¯12,A1hU=A1hU,(x1i,x2j,x3i)∈(ω11∪ω13)×ω¯2×ω¯3,A1hUi00,x1i∈ω¯12∖{x1K1,x1K2},1h12(−Sh(UK1−1)+2UK100−UK1+1,0,0),x1=K1,1h12(−Sh(UK2+1)+2UK200−UK2−1,0,0),x1=K2.

Then we can write the ADI scheme as

(21)

Un+14−Unτ+(A1h+A2h+A3h)Un=fn+12,

(22)

Un+l4−Un+l−14τ+12A5−lh(Un+l4−Un)=0,l=2,3,4.

Next, our goal is to prove that the discrete operators A1h, A2h and A3h are symmetric.

For U,V∈Dh, such that U0jk=UJ1jk=0, V0jk=VJ1jk=0, (x2,x3)∈ω¯2×ω¯3 the formulas

(U,V)=∑j=0J2∑k=0J3d2jd3k(∑i=1K1−1UijkVjk+∑i=K2+1K−1UijkVijk)h1h2h3+|D|∑i=K1K2Ui00Vi00h1,‖U‖=(U,U)1/2

define a scalar product and a norm in this vector space. We remind that dlm=1, if 0<m<Jl and dlm=0.5, if m=0,Jl, l=2,3.

Lemma 4.

The discrete operators A1h, A2h and A3h are symmetric, A1h is positive definite and A2h, A3h are non-negative operators.

Proof.

First, we investigate the operator A2h. Applying the summation by part formula, we get

(A2hU,V)=∑j=0J2∑k=0J3d2jd3k(∑i=1K1−1A2hUijkVijk+∑i=K2+1K−1A2hUijkVijk)h1h2h3+∑j=1J2∑k=0J3d3k(∑k=1K1−1∂x2Uijk∂x2Vijk+∑k=K2+1K−1∂x2Uijk∂x2Vijk)h1h2h3=(U,A2hV),(A2hU,U)⩾0.

In a similar way we get the estimates

(A3hU,V)=∑j=0J2∑k=0J3d2jd3k(∑i=1K1−1A3hUijkVijk+∑i=K2+1K−1A3hUijkVijk)h1h2h3+∑j=0J2∑k=1J3d2k(∑k=1K1−1∂x3Uijk∂x3Vijk+∑k=K2+1K−1∂x3Uijk∂x3Vijk)h1h2h3=(U,A3hV),(A3hU,U)⩾0.

It follows from the obtained estimates that A2h and A3h are symmetric and non-negative definite operators.

Next, we investigate the operator A1h. Applying the summation by parts formula and taking into account the boundary conditions for vectors U, V, and the nonlocal conjugation conditions (19), (20) we get

(A1hU,V)=∑j=0J2∑k=0J3d2jd3k(∑i=1K1−1A1hUijkVijk+∑i=K2+1K−1A1hUijkVijk)h1h2h3+|D|(∑i=K1+1K2−1A1hUi00Vi00h1+1h1(−Sh(UK1−1)+2UK100−UK1+1,0,0)VK100+1h1(−Sh(UK2+1)+2UK200−UK2−1,0,0)VK200)=∑j=0J2∑k=0J3d2jd3k(∑i=1K1∂x1Uijk∂x1Vijk+∑i=K2+1K∂x1Uijk∂x1Vijk)h1h2h3+|D|∑i=K1+1K2∂x1Ui00∂x1Vi00h1=(U,A1hV),(U,A1hU)>0.

Here we use the notation UK1jk=UK100, UK2jk=UK200 for (x2j,x3k)∈ω¯2×ω¯3.

It follows from the obtained estimates that A1h is a symmetric and positive definite operator. □

Due to nonlocal conjugation conditions (19), (20) the classical factorization algorithm should be modified in order to solve 1D subproblems (17)–(20).

Lemma 5.

The unique solution of the linear system of equations (15)–(20) exists and it can be computed by using the efficient factorization algorithm.

Proof.

The given proof also defines the constructive algorithm to solve (17)–(20). Since the first part (15) of the discrete ADI scheme defines an explicit algorithm and 1D subproblems (16) in the second part of the scheme are efficiently solved by using the classical factorization algorithm, then it is sufficient to consider in detail the subproblem (17)–(20). For each (x2j,x3k)∈ω¯2×ω¯3 the solution is factorized separately in subdomains ω11, ω12 and ω13. Let us write equations of the system (17)–(20) as

−aijkUi−1,j,kn+1+cijkUijkn+1−bijkUi+1,j,kn+1=dijk,aijk,bijk,cijk⩾0,cijk⩾aijk+bijk.

1. Domain ω11. The solution is presented in the following form:

(23)

Uijkn+1=αijkUi+1,j,kn+1+γijk,0⩽i<K1,

(24)

α0jk=0,γ0jk=g1(x2j,x3k,tn+1),αijk=bijkcijk−aijkαi−1,j,k,γijk=dijk+aijkγi−1,j,kcijk−aijkαi−1,j,k.

By induction it can be proved that the estimates 0⩽αijk⩽1 are valid.

2. Domain ω12. The solution is presented in the following form:

(25)

Ui00n+1=αi00UK100n+1+βi00UK200n+1+γi00,K1<i<K2.

This factorization is done in two steps. First, the solution is written in the form

(26)

Ui00n+1=α˜i00UK100n+1+β˜i00Ui+1,0,0n+1+γ˜i00,K1<i<K2,

(27)

α˜K1+1,0,0=aK1+1,0,0cK1+1,0,0,β˜K1+1,0,0=bK1+1,0,0cK1+1,0,0,γ˜K1+1,0,0=dK1+1,0,0cK1+1,0,0,α˜i00=ai00ci00−ai00β˜i−1,0,0α˜i−1,0,0,β˜i00=bi00ci00−ai00β˜i−1,0,0,γ˜0k=ai00γ˜i−1,0,0+di00ci00−ai00β˜i−1,0,0.

Let us assume that 0⩽α˜i−1,0,0, β˜i−1,0,0⩽1 and α˜k−1,0,0+β˜k−1,0,0⩽1. Then it follows from the given formulas that 0⩽β˜i00⩽1 and α˜i00⩾0. It remains to prove that α˜i00⩽1 and α˜i00+β˜i00⩽1. It follows from simple inequalities

ai00α˜i−1,0,0+bi00+ai00β˜i−1,0,0=ai00(α˜i−1,0,0+β˜i−1,0,0)+bi00⩽ai00+bi00⩽ci00,

that ai00α˜i−1,00+bi00⩽ci00−ai00β˜i−1,0,0. Thus we get the estimate

(28)

0⩽α˜i00+β˜i00⩽1.

The proof by induction is completed.

Next, we compute coefficients αi00, βi00 and γi00 in formula (25)

(29)

αK2−1,0,0=α˜K2−1,0,0,βK2−1,0,0=β˜K2−1,0,0,αi00=α˜i00+β˜i00αi+1,0,0,βi00=β˜i00βi+1,0,0,γi00=γ˜i00+β˜i00γi+1,0,0,i=K2−2,…,K1+1.

From estimates for α˜0k, β˜0k and (29) it follows that

0⩽αi00,βi00⩽1,0⩽αi00+βi00⩽1.

3. Domain ω13. For each (x2j,x3k)∈ω2×ω3 the solution is presented in the following form:

(30)

Uijkn+1=βijkUi−1,j,kn+1+γijk,K2<i⩽J1,βJ1jk=0,γJ1jk=g2(x2j,x3k,tn+1),βijk=aijkcijk−bijkβi+1,j,k,γijk=dijk+bijkγi+1,j,kcijk−bijkβi+1,j,k.

By induction it can be proved that the estimates 0⩽βijk⩽1 are valid.

Substituting (23)–(30) into equations (19) and (20) we get a linear system of two equations to find UK100n+1, UK200n+1:

(31)

A11UK100n+1+A12UK200n+1=B1,A21UK100n+1+A22UK200n+1=B2,

where coefficients are defined by

A11=1τ+12h12(2−αK1+1,0,0−Sh(αK1−1)),A12=−12h12βK1+1,0,0,A21=−12h12αK2−1,0,0,A22=1τ+12h12(2−βK2−1,0,0−Sh(βK2+1)),B1=UK100n+34τ−12h12(Sh(UK1−1n)−2UK100n+UK1+1,0,0n+γK1+1,0,0−Sh(γK1−1)),B2=UK200n+34τ−12h12(Sh(UK2+1n)−2UK200n+UK2−1,0,0n+γK2−1,0,0−Sh(γK2+1)).

From the estimates given above we have that

2−αK1+1,0,0−Sh(αK1−1)⩾βK1+1,0,0,2−βK2−1,0,0−Sh(βK2+1)⩾αK2−1,0,0,

thus the determinant of the matrix of system (31) is positive and the unique solution UK100n+1, UK200n+1 exists. Then the backward factorization step is applied and Uijkn+1 are computed. □

It is noted in Hundsdorfer and Verwer (2003) that the stability analysis of such ADI schemes for three dimensional cases is far from easy and even the linear commuting case needs a closer attention. Since in our case operators A1h, A2h, A3h don’t commute, we can’t apply the spectral stability analysis.

Here we restrict to writing the stability matrix

R=(I+τ2A1h)−1(I+τ2A2h)−1(I+τ2A3h)−1(I−τ2(A1h+A2h+A3h)).

It follows from Lemma 4, that

‖I+τ2Ajh‖⩽1,j=1,2,3.

Then it is sufficient to assume that the discrete problem is such that the stability estimate

(32)

‖Rn‖⩽C,n⩾1

is satisfied. The validity of this assumption should be tested for each case of the discrete problem experimentally.

5Parallel Algorithm

In this section a parallel version of the ADI algorithm (15)–(20) is presented. We note that development of efficient parallel algorithms for implementation of ADI type solvers is quite a non-trivial task. First the factorization algorithm for 3D problems severely depends on data movement in the memory of computers (from and to in different levels of the memory). Only few arithmetical operations are done with each small size portion of data, therefore cost of data movement is very important issue (see, Guo and Lu, 2016; Otero et al., 2020). Second, nonlocality of some parts of the discrete scheme lead to modifications of the standard factorization algorithm and these changes should be resolved efficiently by the parallel algorithm (see Čiegis et al., 2014, where a parallel ADI algorithm is constructed to solve multidimensional parabolic problems with nonlocal boundary conditions). Third, different approaches of parallelization can be used for different architectures of parallel computers. At least three main classes can be considered, including GPU processors (Imankulov et al., 2021; Xue and Feng, 2018; Otero et al., 2020), shared memory processors (multicore processors) and general global memory parallel machines, when implementation of parallel algorithms can be done, e.g. by using MPI library (Čiegis et al., 2014). In this paper we restrict to the construction of parallel algorithms for general distributed memory computers. It is obvious that these algorithms can be efficiently used also for multicore computers and distributed clusters of multicore nodes.

First, in defining a parallel version of the ADI algorithm for the reduced dimension model we restrict to a a simple modification of the standard sequential factorization algorithm. Only two parts are used to decompose the domain Ωh in each dimension. Therefore the maximum number of parallel processes is restricted to eight processes.

We start by defining a parallel algorithm for the solution of a tridiagonal system of linear equations:

(33)

c0U0−b0U1=F0,−ajUj−1+cjUj−bjUj+1=Fj,j=1,…,J−1,−aJUJ−1+cJUJ=FJ.

Let us assume that two processes are used to solve it in parallel. System (33) is partitioned among both processes. The first process is responsible for the first (J+1)/2 equations 0⩽j<(J+1)/2 and the second process solves the remaining equations. Both processes implement the classical sequential factorization algorithm in parallel but in different directions. The first process applies this algorithm from left-to-right:

Uj=αjUj+1+βj,j=0,…,(J+1)/2−1

while the second process computes in the opposite direction:

Uj=αjUj−1+βj,j=J,…,(J+1)/2.

Then both processes exchange coefficients (αJ+12−1,βJ+12−1), (αJ+12,βJ+12), solve 2×2 dimension systems and find solutions UJ+12−1, UJ+12. The second backward part of the factorization algorithm is again implemented in parallel.

Now we are ready to describe the main decomposition details of the parallel version of ADI algorithm (15)–(20).

1. The case of p=2 processes. The discrete grid ω¯1 is splitted into two parts ω¯1=ω1,p0∪ω1,p1, where

ω1,p0=ω11∪ω¯12,ω1,p1=ω13.

The first process computes the solution on the discrete grid (ω11×ω¯2×ω¯3)∪ω¯12, and the second process computes the solution on the grid ω13×ω¯2×ω¯3.

Before computations of the explicit step (21), the first process sends the value UK200n to the second process, and the second process computes the value of S(UK2+1n) and sends it to the first process. Then both processes compute their parts of Un+14 in parallel.

The l=2,3 steps of the ADI scheme (22) don’t require any communication between processes and can be done in parallel.

The last step l=4 of the ADI scheme (22) is implemented in the following way. The first process computes the factorization coefficients of (23) and (25), and the second process computes coefficients of (30). These computations can be done in parallel. Then the second process computes the values of S(βK2+1n), S(γK2+1n) and sends them to the first process. The latter solves system (31) and sends the value UK200n+1 to the second process. Then both process compute their parts of Un+1 in parallel.

A simple but still accurate complexity model of the parallel ADI algorithm can be constructed

(34)

T2=cJ1J2J32+2(α+2β),

where T2 is computation time of the one time step of the parallel algorithm, α is data communication start-up time, β defines time required to send one element.

We note that this model takes into account only main computational and data communication costs. In real computations the values of these parameters can vary. One more important detail, i.e. automatic memory caching is also not estimated in this formula. Thus efficiency estimates of such models are only asymptotically valuable, but exactly such information is sufficient in most cases.

Let us define the speed-up Sp and efficiency Ep of the parallel algorithm

Sp=T0Tp,Ep=Spp,

where p is the number of processes, T0 is the computation time of the sequential algorithm.

Since T0=cJ1J2J3 and additional data communication costs are constant, then even for moderate size grids ω1×ω2×ω3 it follows from (34) that

S2≈2,E2≈1,

i.e. the parallel algorithm is scalable.

2. The case of p=4 processes. In addition to splitting the discrete grid ω¯1, the grid ω¯2 is also divided into two parts ω¯2=ω2,p0∪ω2,p1, where

ω2,p0={x2j,0⩽j⩽J22},ω2,p1={x2j,J22+1⩽j⩽J2}.

The first process computes the solution on the grid (ω11×ω¯2,p0×ω¯3)∪ω¯12, the second process computes the solution on the grid ω13×ω¯2,p0×ω¯3, the third process computes the solution on the grid (ω11×ω¯2,p1×ω¯3)∪ω¯12, and the fourth process computes the solution on the grid ω13×ω¯2,p1×ω¯3.

Before parallel computations of the explicit step (21), the first process sends the value UK200n to the second and fourth process, these two processes compute their part of S(UK2+1n) and send the obtained values to the first process. In a similar way the first process sends the value UK100n to the third process, which computes its part of S(UK1+1n) and sends the obtained value to the first process.

The second part of data communication is done among two pairs of processes: the first and third processes exchange the values of Ui,J2/2,kn and Ui,J2/2+1,kn, 0<i<K1, 0⩽k⩽J3, and the second and fourth processes exchange the values of Ui,J2/2,kn and Ui,J2/2+1,kn, K2<i<J1, 0⩽k⩽J3. These communications are done in parallel. Then all four processes compute their parts of Un+14 in parallel.

The l=2 step of the ADI scheme (22) don’t require any communication between processes and can be done in parallel.

The l=3 step of the ADI scheme (22) is implemented by using the parallel factorization algorithm, which is described above for the system of linear equations (33). The data communication part between the given pairs of the processes is equivalent to the one defined for the explicit part of the ADI algorithm (21). This step again can be done in parallel.

The last step l=4 of the ADI scheme (22) is implemented as for p=2 processes with one small modification. The second and fourth process compute the values of S(βK2+1n), S(γK2+1n) and the third process computes the values of S(αK1−1n), S(γK1−1n). Then they send these values to the first process. It solves system (31) and sends the values UK100n+1, UK200n+1 to the remaining three process. Then all four process compute their parts of Un+1 in parallel.

The complexity model of the parallel ADI algorithm for p=4 processes is given by

(35)

T4=cJ1J2J34+4α+J1J32β.

It follows from this theoretical model, that now the data communication costs depend linearly on the size of J1J3. Thus only for sufficiently large J2 these costs can be neglected and estimates

S4≈4,E4≈1

are valid. We get that the parallel algorithm is scalable for sufficiently large J2 and this specific size depends on both parameters – computational complexity c and data communication costs β.

3. The case of p=8 processes. In addition to splitting discrete grids ω¯1, ω¯2 the discrete grid ω¯3 is also divided into two parts ω¯3=ω3,p0∪ω3,p1, where

ω3,p0={x3k,0⩽k⩽J32},ω3,p1={x3k,J32+1⩽k⩽J3}.

The parallel algorithm is modified to include data distribution steps in the x3 dimension, all communications are defined by using analogous formulas given for p=4 case in the x2 dimension.

The complexity model of the parallel ADI algorithm for p=8 processes is given by

(36)

T8=cJ1J2J38+4α+(J1J34+J1J24)β.

It follows from this model that for sufficiently large grid sizes, when J1J2J3≫J1(J2+J3), the estimates

S8≈8,E8≈1

are valid and the parallel algorithm is scalable.

Remark 1.

Let us consider the case when the number of processors/cores is larger than 8. In order to use all p processors, we modify the constructed parallel algorithm. We assume that p is an even number and factorize it as p=2×p2×p3. The decomposition of the discrete grid Ωh is done in two steps. First, the ω¯1 grid is divided into two parts following the algorithm defined for p⩽8 processes. Thus the implementation of the parallel algorithm in x1 dimension remains unchanged. Next, we divide the grids ω¯j into pj, j=2,3 parts. If pj>2, then in this dimension the proposed parallel factorization algorithm is changed to a more expensive but general algorithm. We recommend to use Wang’s or Cyclic reduction parallel factorization algorithms (Kumar et al., 1994; Imankulov et al., 2021).

6Computational Experiments

Consider the problem (1)–(4) in the domain Ω with geometry parameters X1=6, X2=1, X3=1. We solve this problem till T=1 with given initial, boundary conditions and the zero source function

u0(x1,x2,x3)=0,g1(x2,x3,t)=5te−10((x2−0.5)2+(x3−0.5)2),g2(x2,x3,t)=2te−6((x2−0.5)2+(x3−0.5)2),f(x1,x2,x3,t)=0.

The accuracy of time integration of ADI schemes. First, we have tested the time integration accuracy of the ADI solver for the full dimension model. A uniform space grid Ωh with J1=120, J2=20, J3=20 is used. Table 1 gives for a sequence of decreasing time step widths τ the errors e(τ) and the experimental convergence rates ρ(τ) of the discrete solution for ADI scheme (12) in the maximum norm:

e(τ)=max(x1i,x2j,x3k)∈Ωh|UijkN−U(x1i,x2j,x3k,T)|,ρ(τ)=log2(e(2τ)/e(τ)),

where the reference solution U(x1i,x2j,x3k,T) is computed by using the very small time step τ=6.25×10−5. Then the integration error introduced by the ADI scheme can be measured accurately.

Table 1

Errors e(τ) and experimental convergence rates ρ(τ) for the discrete solution of ADI scheme (12) for a sequence of time steps τ.

τ	e(τ)	ρ(τ)
0.002	3.0352×10−5	2.016
0.001	7.5272×10−6	2.012
0.0005	1.8549×10−6	2.021
0.00025	4.411×10−7	2.072

It follows from the presented results, that the accuracy of ADI scheme agrees well with the theoretical prediction.

The accuracy of the reduced dimension model. Next we investigate the accuracy of the reduced dimension model (5)–(11). For the space discretization a uniform grid Ωh with J1=300, J2=50, J3=50 is used, and integration in time is done with τ=0.001.

The numerical tests have been performed on the computer with Intel^® Xeon^® processor E5-2670, 8GB RAM, and the algorithms have been implemented using C++ language.

We note that CPU time for computing the full model solution by using the ADI scheme (12) is 55.9 seconds.

Table 2 gives for a sequence of reduction parameter δ errors e(δ)

e(δ)=max(x1i,x2j,x3k)∈Ωhh|UijkN−UijkN(δ)|

of the reduced dimension model (5)–(11) discrete solution in the maximum norm, where UijkN is the solution of the ADI scheme (12) and UijkN(δ) is the solution of the reduced dimension ADI scheme (15)–(20). CPU times are also presented for the scheme (15)–(20).

Table 2

Errors e(δ) of the discrete solution of the reduced dimension model (5)–(11) for a sequence of truncation parameter δ.

	δ=1	δ=1.5	δ=2	δ=2.8
e(δ)	4.304×10−3	1.748×10−4	7.201×10−6	4.522×10−8
CPU	19.4	29.1	40.8	60.2

It follows from the presented results, that starting from δ=1 the accuracy of the reduced dimension model is sufficient for most real world applications, while CPU time for the reduced dimension model is reduced up to 3 times.

The efficiency of the parallel ADI scheme (15)–(20). In this section we present results of the parallel scalability tests. All parallel numerical tests in this work were performed on the computer cluster “HPC Vanagas” at the High Performance Computing Laboratory of Vilnius Gediminas technical university. We have used up to 8 cores of Intel^® Xeon^® processor E5-2630 with 20 cores (2.20 GHz) and 32 GB DDR4 of RAM. The parallel algorithms have been implemented using MPI library.

Our goal is to investigate the efficiency of the proposed parallel version of the ADI scheme (15)–(20). We have solved two problems with different sizes of the discrete meshes 450×75×75 and 600×100×100. The CPU time Tp, speed-up Sp and efficiency Ep coefficients are presented in Table 3.

Table 3

Scalability analysis of the parallel algorithm. The speed-up Sp and efficiency Ep coefficients for two problems of dimension 450×75×75 and 600×100×100.

	Tp(450)	Sp(450)	Ep(450)	Tp(600)	Sp(600)	Ep(600)
p=1	77.4	–	–	254.1	–	–
p=4	17.5	4.42	1.12	57.79	4.40	1.10
p=8	9.85	7.86	0.97	29.2	8.70	1.08

Two conclusions follow from the presented results of computational experiments. First, the constructed parallel algorithm scales well even for problems of moderate sizes.

Second, MPI library enables efficient usage of the allocated local memory for each process, thus the efficiency of the parallel algorithm Ep over 1 is achieved.

As a future task, it is interesting to consider the scalability of the modified parallel algorithm from Remark 1, when the number of processes is larger than 8.

7Conclusions

In this paper we have applied the new efficient approach how to solve the heat equation in 3D domains. This problem is reduced to a hybrid dimension problem, keeping the initial dimension only in some parts and reducing it to one-dimensional equation within the domains in some distance from the base regions. Such mathematical models are typical in industrial installations such as pipelines. We added two additional improvements into this methodology. First, the economical ADI type finite volume scheme is constructed to solve the non-classical heat conduction problem. A finite volume method is used to approximate space differential operators with non classical conjugation conditions between the 3D and 1D parts. It is proved that the ADI scheme is unconditionally stable.

Second, the parallel factorization algorithm is proposed to solve the obtained systems of discrete equations. The ADI scheme leads to non-iterative implementation algorithm and a set of one dimensional linear systems are solved by using the parallel versions of the factorization algorithm for the parallel solving tridiagonal equations systems in each direction. An efficient modification of the basic factorization algorithm is developed to resolve non-local conjugation conditions.

The presented results of numerical experiments confirm both main theoretical conclusions. Due to both modifications the run-time of computations is reduced essentially. First, the hybrid reduced dimension models can be used to simulate heat conduction models for a broad set of domains and CPU time of the serial algorithm is reduced up to factor 3. Second, the developed parallel version of the ADI algorithm enables to achieve effective acceleration of computations and it scales well even for discrete problems of moderate sizes.

References

1	Amar, H., Givoli, D. ((2018) ). Mixed-dimensional modeling of time-dependent wave problems using the Panasenko construction. Journal of Theoretical and Computational Acoustics, 26: (3), 1850034. https://doi.org/10.1142/S2591728518500342.
2	Amosov, A., Panasenko, G. ((2018) ). Partial dimension reduction for the heat equation in a domain containing thin tubes. Mathematical Methods in the Applied Sciences, 41: (18), 9529–9545. https://doi.org/10.1002/mma.5311.
3	Čiegis, R., Katauskis, P., Skakauskas, V. ((2018) ). Numerical methods for fractional diffusion. Nonlinear Analysis: Modelling and Control, 23: (4), 234–250. https://doi.org/10.15388/NA.2018.2.6.
4	Čiegis, R., Suboč, O., Bugajev, A. ((2014) ). Parallel numerical algorithms for three-dimensional parabolic and pseudoparabolic problems with different boundary conditions. Nonlinear Analysis: Modelling and Control, 19: (3), 382–395. https://doi.org/10.15388/NA.2014.3.5.
5	Čiegis, R., Panasenko, G., Pileckas, K., Šumskas, V. ((2020) ). ADI scheme for partially dimension reduced heat conduction models. Computers & Mathematics with Applications, 80: (5), 1275–1286. https://doi.org/10.1016/j.camwa.2020.06.012.
6	Eymard, R., Gallouët, T., Herbin, R. ((2000) ). Finite volume methods. Handbook of Numerical Analysis, 7: , 713–1018. https://doi.org/10.15388/NA.2018.2.6.
7	Faille, I. ((1992) ). A control volume method to solve an elliptic equation on a two-dimensional irregular mesh. Computer Methods in Applied Mechanics and Engineering, 100: (2), 275–290. https://doi.org/10.1016/0045-7825(92)90186-N.
8	Guo, G., Lu, S. ((2016) ). Unconditional stability of alternating difference schemes with intrinsic parallelism for two-dimensional fourth-order diffusion equation. Computers & Mathematics with Applications, 71: , 1944–1959.
9	Hundsdorfer, W., Verwer, J. ((2003) ). Numerical Solution of Time-Dependent Advection-Diffusion-Reaction Equations. Springer Series in Computational Mathematics, Vol. 33: . Springer, Berlin, Heidelberg, New York, Tokyo. https://doi.org/10.1007/978-3-662-09017-6.
10	Imankulov, T., Daribayev, B., Mukhambetzhanov, S. ((2021) ). Comparative analysis of parallel algorithms for solving oil recovery problem using CUDA and OpenCL. International Journal of Nonlinear Analysis and Applications, 12: (1), 351–364.
11	Kumar, V., Grama, A., Gupta, A., Karypis, G. ((1994) ). Introduction to Parallel Computing: Design and Analysis of Algorithms. Benjamin/Cummings, Redwood City, CA.
12	Otero, B., Rojas, O., Moya, F., Castillo, J.E. ((2020) ). Alternating direction implicit time integrations for finite difference acoustic wave propagation: parallelization and convergence. Computers and Fluids, 205: , 104584.
13	Panasenko, G. ((1998) ). Method of asymptotic partial decomposition of domain. Mathematical Models and Methods in Applied Sciences, 8: (1), 139–156. https://doi.org/10.1142/S021820259800007X.
14	Panasenko, G. ((2005) ). Multi-scale Modeling for Structures and Composites. Springer Netherlands, Dordrecht. https://doi.org/10.1007/1-4020-2982-9.
15	Panasenko, G., Viallon, M.C. ((2015) ). Finite volume implementation of the method of asymptotic partial domain decomposition for the heat equation on a thin structure. Russian Journal of Mathematical Physics, 22: (2), 237–263. https://doi.org/10.1134/S1061920815020107.
16	Samarskii, A.A. ((2001) ). The Theory of Difference Schemes. Marcel Dekker, New York. https://doi.org/10.1007/1-4020-2982-9.
17	Skakauskas, V., Katauskis, P., Čiegis, R. ((2018) ). Modelling of the NO + CO reaction over inhomogeneous surfaces. Journal of Mathematical Chemistry, 56: (9), 2626–2642. https://doi.org/10.1007/s10910-018-0908-3.
18	Viallon, M.C. ((2013) ). Error estimate for a 1D–2D finite volume scheme. Comparison with a standard scheme on a 2D non-admissible mesh. Comptes Rendus Mathematique, 351: (1), 47–51. https://doi.org/10.1016/j.crma.2013.01.011.
19	Xue, G., Feng, H. (2018). A new parallel algorithm for solving parabolic equations. Advances in Difference Equations. 174: . https://doi.org/10.1186/s13662-018-1617-8.

Abstract

1Introduction

2Problem Formulation

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

3ADI Scheme for the 3D Differential Problem

(12)

Lemma 1.

Proof.

(13)

Lemma 2.

(14)

Lemma 3.

Proof.

4The ADI Scheme for the Partially Dimension Reduced Problem

(15)

(16)

(17)

(18)

(19)

(20)

(21)

(22)

Lemma 4.

Proof.

Lemma 5.

Proof.

(23)

(24)

(25)

(26)

(27)

(28)

(29)

(30)

(31)

(32)

5Parallel Algorithm

(33)

(34)

(35)

(36)

Remark 1.

6Computational Experiments

Table 1

Table 2

Table 3

7Conclusions

References

Share this:

North America

Europe

Asia