On Applying Weighted Seed Techniques To GMRES Algorithm For Solving Multiple Linear Systems

: In the present paper, we are concerned by weighted Arnoldi like meth-ods for solving large and sparse linear systems that have diﬀerent right-hand sides but have the same coeﬃcient matrix. We ﬁrst give detailed descriptions of the weighted Gram-Schmidt process and of a Ruhe variant of the weighted block Arnoldi algorithm. We also establish some theoretical results that links the iterates of the weighted block Arnoldi process to those of the non weighted one. Then, to accelerate the convergence of the classical restarted block and seed GMRES methods, we introduce the weighted restarted block and seed GMRES methods. Numerical experiments are reported at the end of this work in order to compare the performance and show the eﬀectiveness of the proposed methods.


Introduction
In this paper, we are interested in solving multiple linear systems that have the same coefficient matrix and which have the form where A is a non-symmetric real square matrix of size n and B := [b (1) , . . . , b (s) ], X := [x (1) , . . . , x (s) ] are n × s real rectangular matrices, such that the block righthand side B is full rank and s ≪ n.
Block linear systems of kind (1.1) are encountered in many problems of scientific computing and engineering applications. Such block equations are posed, for example, in wave scattering problems, time-dependent incompressible Navier-Stokes equations in computational fluid dynamics, structural mechanics computations based on finite element analysis and in many other areas [18]. When the coefficient matrix A is a large and sparse, several iterative methods have been developed during the last three decades [2,5,8,11,16,21,22]. Generally, the iterative methods described in the literature are projection methods on some Krylov subspace and can be divided into three classes methods.
Block solvers are the first popular algorithms that were used for solving the linear system (1.1). Generally, block Krylov methods are more efficient when they are applied to relatively dense linear systems and combined with some preconditioning techniques [9]. In this first class, one can cite the block conjugate gradient (Bl-CG) for symmetric definite matrices, the block Bi-conjugate Gradient (Bl-BCG) and the block Bi-Conjugate Gradient stabilized (Bl-BiCGSTAB) methods [4,16]. Other popular methods can be enumerated like the block-quasi-minimal residual (Bl-QMR) algorithm [5,14], the block-generalized minimum residual (Bl-GMRES) algorithm [20,22]. In practice, the block methods require a deflation procedure to detect and delete linearly or almost linearly dependent vectors in the block Krylov subspaces generated during the iterations [15,17].
The seed methods form another family that can be applied to the solution of multiple linear systems. The first works in this class appeared in [21], [13] and [2], respectively. These first methods were based on the Conjugate Gradient algorithm. In seed methods, we have to choose one right-hand system as a seed system and use the corresponding Krylov subspace as a projection subspace for the remaining right-hand systems. After solving the seed system, a better initial guess for the remaining systems is obtained. This procedure is repeated with another seed system until all the systems are solved. Improvements of seed techniques were also applied to the GMRES method and the obtained seed GMRES (SGMRES) method was compared with block and classical solvers in [19]. Note that the efficiency of seed methods depends on how closely related the right-hand sides are [1,2].
Matrix Krylov subspace methods -also called global methods-can be seen as an alternative to block Krylov methods. These class of methods is particularly suitable for sparse multiple linear systems [8,9,11,12]. The principal difference between classical single Krylov solvers and global methods appears in the different processes that are used for constructing the Krylov basis. More precisely, and for example, the global Arnoldi process uses the Frobenius scalar product instead of the euclidean scalar product used in the classical Arnoldi process. Moreover, it was proved in [9] that solving a block linear system of the form (1.1) with a global method is equivalent to applying the corresponding classical method to the linear system (I ⊗ A) vecX = vecB, where C ⊗ D := (c i,j D) ∈ R m p×n q for every C ∈ R m×n and D ∈ R p×q and vecB : In this work, we focus mainly on weighted Arnoldi like methods. We recall that the weighting technique, initially introduced in [3] for single linear systems and recently investigated in [7], was applied to the block case in [10]. As the description of the weighted block Arnoldi process given in [10] is brief, we give in this work a detailed description of a Ruhe variant of the weighted block Arnoldi process. Then, we introduce the restarted weighted block GMRES method in a similar fashion to that of GMRES(m). The numerical experiments we have conducted show that the convergence of the weighted Bl-GMRES method may be affected by the occurrence of a linear dependency between the columns of the Krylov matrix. This led us to apply the weighting technique to the seed GMRES algorithm and to propose the weighted seed GMRES method as an improvement of the classical seed GMRES algorithm.
The remainder of this work is organized as follows. In section 2, we began with a detailed description of the weighted Gram-Schmidt process which is used to build a sequence of D orthonormal vectors where D is a diagonal matrix with positive entries. Then, in the second part of section 2, we introduce a Ruhe variant of the weighted block Arnoldi process. Section 2 is ended by establishing some theoretical results that links the iterates of the weighted block Arnoldi process to those of the non weighted one. These results generalize to the block case the results obtained in [3] for the case of a single linear system. In section 3, we first describe the weighted block GMRES. The derivation of the weighted seed GMRES method is also given in section 3. Section 4 is devoted to some numerical examples that show and compare the effectiveness of the new proposed methods.
Throughout this paper, the following notations are used. The zero and identity matrices are denoted 0 n×m ∈ R n×m , (0 n , if n = m) and I n ∈ R n respectively. The Frobenius inner product of two matrices X, Y ∈ R n×s is defined by < X, Y > F := tr(X T Y ) where X T is the transpose matrix of X and tr(Z) denotes the trace of the square matrix Z. The associated norm is the Frobenius norm denoted by X F := tr(X T X). Moreover, if D ∈ R n×n is a symmetric and positive definite matrix, then X⊥ D Y ⇐⇒ X T D Y = 0 s and the D-inner product of x, y ∈ R n and of X, Y are respectively defined by < x, y > D := x T D y and < X, Y > D := tr(X T D Y ).
We also define the D-norm of X by which is associated to the D-inner product < X, Y > D .

The weighted Gram-Schmidt and weighted block Arnoldi processes
Let D ∈ R n×n be a positive diagonal matrix whose diagonal entries are Let also v be a n-dimensional real vector and K m (A, v) = span{v, A v, . . . , A m−1 v} the corresponding Krylov subspace. We recall that, based on the modified Gram-Schmidt procedure, the weighted Arnoldi process constructs a D-orthonormal basis of K m (A, v), see [3] for instance. Thus, before deriving the weighted block Arnoldi process, we have to describe a Gram-Schmidt procedure for building a set of Dorthonormal vectors.

The weighted Gram-Schmidt process
GivenÃ = [ a 1 , a 2 , . . . , a p ] ∈ R n×p , the weighted Gram-Schmidt process (here after denoted by W-GS) applied to A is described by Algorithm 1.
end For 8.
If r j,j = 0   If the matrix A is full rank, then it is easy to verify that p iterations of Algorithm 1 can be carried out without encountering any breakdown. Moreover, the W-GS computes an n × p D-orthonormal matrix Q = [q 1 , . . . ,q p ] with respect to the inner product ., . D and a p × p upper triangular matrix R = (r i,j ), such that A = Q R, and Q T D Q = I p .
In the rest of this paper we refer to the above decomposition as the w Q R factorization of the matrix A and we use the Matlab like notation [ Q, R] = w QR( A).

The weighted block Arnoldi process
we have to generalize the results obtained in [3] to the block case. This task will be achieved by first introducing a Ruhe variant of the weighted block Arnoldi process which is described by Algorithm 2.
We observe that the particular case s = 1 coincides with the classical weighted Arnoldi process [3]. For i ≥ s, we denote byṼ (i) the n × i matrix whose columns areṽ 1 , . . . ,ṽ i and by H (i) the (i + s) × i matrix whose non-zero entries are theh j,k defined in lines 5 and 8 of Algorithm 2. Then assuming exact arithmetic, the above process givesṼ (i) and H (i) that satisfy and that every vectorṽ k+s (k = 1, . . . , m s) satisfies . . , m and suppose that ms iterations of Algorithm 2 are performed with exact arithmetic and without any breakdown. Then, we also have following relation where H m , H m are respectively the (m + 1)s × ms and ms × ms block upper Hessenberg matrices whose non-zero block entries areH i,j := (h l,q ) ∈ R s×s with l = (i − 1)s + 1, . . . , is, q = (j − 1)s + 1, . . . , js and E m := (0 s , . . . , 0 s , I s ) T is the ms × s rectangular matrix whose m-th block element is I s the identity matrix of size s. Note also that, the obtained block column matrix V m is D orthonormal, which means that Before ending this subsection, we note that the numerical tests we have conducted suggest that the weighted Block Arnoldi process can suffer from a loss of linear independence between the vectors of the Krylov matrix V m . This problem can be corrected by a similar deflation procedure to those proposed in [15] and references therein.

Theoretical results
Now, let us establish some relations between the bases and matrices generated by the classical block Arnoldi and the weighted block Arnoldi processes. The results that are given in this part are generalizations of those given in [3]. In the sequel, we suppose that ms iterations of the two processes are applied to the pair (A, V and On Applying Weighted Seed ... For Solving Multiple Linear Systems 161 Proof: The proof of this result is obtained analogously to that of Proposition 1 in [3]. ✷ Now, since U m+1 is an invertible upper triangular matrix, we can partition U m+1 and its inverse as following where U m+1 , G m+1 ∈ R ms×s are formed by the last s columns and the first ms rows of U m+1 and U −1 m+1 respectively, U m+1,m+1 , G m+1,m+1 ∈ R s×s are formed by the last s rows and last s columns of U m+1 and U −1 m+1 respectively.
The following result gives two other relations that are satisfied by the Hessenberg matrices H m and H m .
Proof: The proof of this result is obtained analogously to that of Proposition 2 in [3]. ✷

Solving multiple linear systems with weighted Arnoldi based methods
In this section, we will introduce two methods for solving linear systems with several right-hand sides of the form (1.1). The first one belongs to the block Krylov subspace family methods and is based on the use of the weighted block Arnoldi process. The second method is a seed Krylov method and is based on the use of the weighted Arnoldi process.

The weighted block GMRES method
The derivation of the weighted block GMRES (in short WBl-GMRES) method is similar to that of the classical unweighted block GMRES method (in short Bl-GMRES). More precisely, given an initial guess X 0 ∈ R n×s , the WBl-GMRES computes successive iterates X k , k = 1, 2, . . . , such that where V k is the D-orthonormal basis constructed by the weighted Block Arnoldi process (Algorithm 2), and the matrix Z k ∈ R ks×s satisfies the minimizing norm condition Z k = arg min R k D = arg min Z∈R ks×s Using (3.1) and the formula (2.2) where the index m is replaced by k, we can rewrite the k-th residual as where E 1 = (I s , 0 s , . . . , 0 s ) T ∈ R (k+1)×s and H 1,0 ∈ R s×s is the upper triangular matrix obtained when computing the w QR decomposition of R 0 , i.e., and then Z k is the solution of the ks × s least-squares problem min Z∈R ks×s Note that since WBl-GMRES uses the weighted block Arnoldi process, its computational cost and required memory increase with each iteration. To overcome these problems, an alternative is to restart WBl-GMRES after m iterations, taking the last computed residual as the next initial residual. This strategy is called the restarted WBl-GMRES and denoted by WBl-GMRES(m).
We also point out that the use of D-inner product instead of the Euclidean scalar product in the block GMRES method does nothing to improve the convergence of the latter. However and as shown by Essai in [3] the use of a D-inner product that change at each cycle of the restarted WBl-GMRES method often improves the convergence.
Next, we describe the restarted WBl-GMRES method.
1. Choose X 0 ∈ R n×s ; a tolerance ε; a positive integer m; Define a diagonal matrix D with a positive diagonal. 4. Apply Algorithm 2 to (A, R 0 ) to compute V m+1 and H m ; 5. Determine Z m the solution of the least-squares problem min Z∈R ms×s Stop; 10. else 11. Regarding the choice of weighting matrix D, we do not yet know if there is, theoretically, an optimal matrix. However we propose in the sequel four choices that were made heuristically. We note that these choices generalize the choice given by Essai in [3] for the case s = 1 and that the coefficients of the matrix D are given as function of R 0 the residual obtained at the end of each cycle of the previous algorithm. We also note that the choices given below are different from the one proposed in [10].
Let us write R 0 as ..,nj=1,...,s and the weighting matrix D as We propose four choices which are

The weighted seed GMRES method
Before introducing the weighted seed GMRES algorithm, we recall that the computational costs of block methods rapidly increase with s the number of right-hand sides in (1.1), and so the block methods lose their advantage. Moreover, in many practical and real applications, the right-hand sides are not arbitrary and have something in common or are very close. In this case, an alternative to block methods is the use of seed projection techniques for solving (1.1). The idea used in the seed Krylov methods is to select a single "seed" system and some Krylov method as a generator of approximations for multiple right-hand sides [2,19,21]. In the Krylov seed algorithm a single Krylov subspace -corresponding to the seed system-is generated, then the residuals of the non-seed systems are projected orthogonally onto this generated Krylov subspace in order to get the approximate solutions. The whole process is repeated with an other seed system until all the systems are solved.
Seed techniques were first proposed for symmetric and positive definite systems by Smith et al [21]. In [2], the authors show that a better convergence behaviour of the seed CG process when compared to the classical CG process. For unsymmetric systems, a seed GMRES method was proposed in [19] and a Morgan's Krylov subspace method augmented with eigenvectors was presented in [6].
The weighted seed GMRES method starts by choosing the first seed system Then, at each restart of the current solve, the initial guesses x of the other systems on the current basis V σ m+1 . More precisely, we look for initial guessesx we finally have to solve at each restart the least squares problem arg min On Applying Weighted Seed ... For Solving Multiple Linear Systems 165 Summarizing the above ideas and following those given in [19], we describe the weighted seed GMRES method as follows Algorithm 4: The weighted seed GMRES method (SWGMRES(m)).
1. Choose X 0 ∈ R n×s ; a tolerance ε; a positive integer m; Apply Algorithm 2 to (A, r (σ) ) to compute the basis V m+1 and the block upper Hessenberg matrix H m ; , for j = 1, . . . , s and j = σ;

10.
Determine z (j) the solution of the least squares problem min end For

20.
End for

Numerical experiments
In this section, we examine three numerical examples in order to compare the weighted Arnoldi based methods with their non weighted corresponding versions.
All of the reported experiments were performed on a 32-bit CORE I7 processor at 2. For all the methods, the starting guess was taken to be zero and the righthand side B is such that B = A E where E is an n × s random matrix with entries uniformly distributed in the interval [0 1]. The stopping criterion used for the block methods was R k F / R 0 F ≤ ε = 10 −10 while in the seed methods the iterations were stopped for the i-th system when r Moreover, a maximum of 351 and 351 × s restarts was allowed for the non seed methods and the seed methods respectively.
Before describing and commenting on various numerical tests, we specify that all the matrices tested here, except from those of experiment 4, are coming from the Matrix Market web server 1 , or from the University of Florida Sparse Matrix collection 2 . Note also that the presence of a ⋆ in a table of results signifies that the maximum allowed number of restarts was reached before convergence.

Experiment 1
We recall that at the end of subsection 3.1, we proposed four choices for the weighting matrix D. Thus, to compare these choices, we report in Fig 1 and Table 1 the results obtained by the restarted Bl-GMRES(m) -denoted in this first numerical example by BG-and the restarted WBl-GMRES(m) -denoted in this experiment by WBG-i. Note that i = 1, 2, 3 or 4 and stands for the i-th choice of the weighting matrix D. The plots given in Figure 1 show the evolution of the norm of the residual according to the number of restarts. As the WBG-1 has not converged for the matrices A=rdb3200l and A=memplus, the corresponding curves were not plotted. In view of the results listed in Table 1 and of the plots given in Figure 1, we find that the choice 2, 3 or 4 give better results than those given by the choice 1. Specifically, we noticed that when the first choice is made, the column vectors of the matrix V (i) built by the Arnoldi process become numerically dependent as the number of iterations increases. This dependence leads to obtain deficient least squares problems. The numerical results reported in Table 1 also indicate that the CPU time obtained with the choice 2 are relatively better than those obtained with choices 3 or 4.
In addition, analysis of the results of this first numerical example clearly shows that the weighting strategy succeeds in improving convergence. Indeed, the number of iterations and the CPU time needed for convergence of the WBl-GMRES methods are better compared to those needed by the Bl-GMRES method.

Experiment 2
In this set of experiments, we give the results obtained when comparing the classical restarted Bl-GMRES(m), the restarted WBl-GMRES(m) described by Algorithm 3 and where the chosen weighting matrix D is the one given in the choice 2 and the WSGMRES(m) described by Algorithm 4. Comparing the results coming from this second set of numerical testing, we notice that this time, there is not really a superiority of the weighted block GMRES method compared to the unweighted one. However, it is clear that the restarted weighted seed GMRES method works best and outperforms the two other methods.

Experiment 3
To compare the performance of the proposed methods when they are combined with a preconditioning strategy, we report in Table 3 the results obtained when comparing the preconditioned restarted block GMRES (denoted here by PBl-GMRES(m)), the preconditioned WBl-GMRES method (denoted here by PWBl-GMRES(m)) and the preconditioned WSGMRES method (denoted here by PWSGMRES(m)). As in the previous experiment, the chosen weighting matrix D is the one given in the choice 2. In all this set of experiments, we used the ILU(0) preconditioning [18].
Once again, the various results obtained in the numerical tests of the third experiment show that the CPU time required for the convergence of the PWS-GMRES method is much better compared to those provided at the end of the convergence of the PWBl-GMRES and the PBl-GMRES methods.
On Applying Weighted Seed ... For Solving Multiple Linear Systems 169 In this last experiment, we want to illustrate the effectiveness of the weighted seed GMRES (WS-GMRES) method when the right-hand sides are close. The test matrix A is obtained from the centred finite difference discretization of the operator on the unit square [0, 1] × [0, 1] with homogeneous Dirichlet boundary conditions. The dimension of the matrix A is n = n 2 0 where n 0 is the number of inner grid points in each direction. The right-hand side B = (B i,j ) is such that So the i-th column b (i) of the right-hand side B is obtained by shifting the components of the column b (i−1) by one position and the first component is replaced by the last one (see [2] for a detailed explanation of this choice). Again and as in the previous experiment, the chosen weighting matrix D is the one given in the choice 2. Different values of n 0 , m and s are used and the obtained results are reported in Table 4. Since, in this experiment, the right-hand sides are close, near-linear dependence may arise among the columns of the right-hand side B. In this case, performance of block methods is poor. We recall that for some similar situations, block methods may even suffer from a near breakdown problem and to handle this situation, we have to use a deflation procedure [15,17]. We note also that clearly, the weighted seed method is very effective. As explained in [2], when the the right-hand sides share the same information, it usually takes only a few restarts for seed methods to solve all the systems.

Conclusion
In this work, we first described in detail the weighting technique -originally introduced by Essai in [3]-applied to the block Arnoldi process by Imakura et al in [10]. We also proposed four choices to set the weighting matrix and which also allow to generalize the weighting matrix heuristically proposed in [3] and investigated in [7]. Similarly, we have combined the weighting strategy with the seed strategy -first introduced by Smith et al [21] and by Joly in [13]-to introduce the weighted seed GMRES method. The numerical tests we have obtained show that in some cases the weighting strategy improves the convergence of the block GM-RES algorithm. However, we noticed a potential loss of linear independence when computing the Krylov-Arnoldi basis by the weighted block Arnoldi process. This issue is the subject of a work in progress. In contrast, we note that the weighted seed GMRES method does not suffer from a loss of linear independence. Moreover, it gives better results since the CPU time and the iterations number required for convergence are largely reduced compared to those of the weighted block GMRES and to those of the classical block GMRES algorithms.