next up previous contents index
Next: Krylov Subspaces and Projection Up: The Implicitly Restarted Arnoldi Previous: The Implicitly Restarted Arnoldi

Structure of the Eigenvalue Problem

A brief discussion of the mathematical structure of the eigenvalue problem is necessary to fix notation  and introduce ideas that lead to an understanding of the behavior, strengths and limitations of the algorithm. In this discussion, the real and complex number fields are denoted by and respectively. The standard n-dimensional real and complex vectors are denoted by and and the symbols and will denote the real and complex matrices with m rows and n columns. Scalars are denoted by lower case Greek letters, vectors are denoted by lower case Latin letters and matrices by capital Latin letters. The transpose of a matrix ${\bf A}$ is denoted by and the conjugate-transpose by The symbol, will denote the Euclidean or 2-norm of a vector. The standard basis for is denoted by the set  

The set of numbers is called the spectrum  of ${\bf A}$. The elements of this discrete set are the eigenvalues  of ${\bf A}$ and they may be characterized as the n roots of the characteristic polynomial     . Corresponding to each distinct eigenvalue is at least one   nonzero vector such that . This vector is called a right eigenvector   of ${\bf A}$ corresponding to the eigenvalue . The pair is called an eigenpair.  A nonzero vector such that is called a left eigenvector.  The multiplicity of as a root     of the characteristic polynomial is the algebraic multiplicity and the dimension of is the geometric multiplicity of . The matrix ${\bf A}$ is defective   if for some eigenvalue and otherwise it is non-defective. The eigenvalue is simple  if

A subspace of is called an invariant subspace    of ${\bf A}$ if . It is straightforward to show if , and satisfy


then is an invariant subspace of ${\bf A}$. Moreover, if has full column rank k then the columns of form a basis for this subspace and . If, in addition, k=n then and ${\bf A}$ is said to be similar  to under the similarity transformation   with ${\bf A}$ is diagonalizable if it is similar to a diagonal matrix and this property is equivalent to ${\bf A}$ being non-defective.

Invariant subspaces are central to the methods developed here. Invariant subspaces generalize the notion of eigenvectors. If , then , and is a one dimensional invariant subspace of More generally, if for and we put ,then is an invariant subspace of ${\bf A}$ and indeed where . If where is unitary and is upper triangular(the standard QR-factorization), then where is an upper triangular matrix with the eigenvalues on its diagonal. The columns of provide an orthonormal basis for the invariant subspace .

A large condition number  of and hence of will indicate these eigenvalues and this invariant subspace are sensitive to perturbations in ${\bf A}$ (such as those introduced by roundoff error in a finite precision computation). But this is not the entire story. Separation of eigenvalues and angles between invariant subspaces also come into play. In the symmetric (Hermitian) case, invariant subspaces corresponding to distinct eigenvalues are orthogonal to each other and completely decouple the action of the matrix (as an operator on . In the non-symmetric case, such a decoupling is generally not possible. The nature of the coupling is completely described by the Jordan canonical form but this form   is usually extremely sensitive to perturbations and hence unsuitable as the basis for a computational algorithm.

The Schur decomposition [42] does provide a means to express this coupling    and provides a foundation for the development of stable numerical algorithms. In particular, the implicitly shifted QR algorithm computes a Schur decomposition. Schur's result states that every square matrix is unitarily similar to an upper triangular matrix. In other words, for any linear operator on , there is a unitary basis in which the operator has an upper triangular matrix representation. The following result may be found in [16].

Theorem 13862 ((Schur Decomposition))

. Let . Then there is a unitary matrix   and an upper triangular matrix such that

The diagonal elements of are the eigenvalues of ${\bf A}$.




A Schur decomposition is not unique. The eigenvalues of ${\bf A}$may appear on the diagonal of in any specified order. Thus, for any specified set of k eigenvalues of ${\bf A}$, there is a Schur decompostion such that these k eigenvalues appear as diagonal elements of the leading principal submatrix of the upper triangular matrix .If denotes the leading k columns of the corresponding unitary matrix , then  
  (1)
is obtained by equating the leading k columns on both sides of (4.1.2). We shall call this a partial Shur decomposition.   Note that is an invariant subspace of ${\bf A}$ corresponding to the k specified eigenvalues and that the columns of form an orthonormal basis for this space. This is called a Schur basis for the invariant subspace. The Implicitly Restarted Arnoldi Method is designed to compute a partial Schur form corresponding to selected eigenvalues of ${\bf A}$.

The fundamental structure of Hermitian and normal matrices is easily derived from the Schur decomposition.  

Lemma 13891

A matrix is normal () if and only if with unitary and diagonal. Morevover, ${\bf A}$ is Hermitian () if and only if is diagonal with real entries. In either case, the diagonal entries of are the eigenvalues of ${\bf A}$ and the columns of are the corresponding eigenvectors.

For purposes of algorithmic development this structure is fundamental. In fact, the well known Implicitly Shifted QR-Algorithm [14,15] is designed to produce a sequence of unitary similarity transformations with that iteratively reduce ${\bf A}$ to upper triangular form. This algorithm begins with an initial unitary similarity transformation of ${\bf A}$ with to the condensed form where is upper Hessenberg matrix . An upper Hessenberg matrix  has zero elements below its main subdiagonal so it is almost upper triangular. When ${\bf A}$ is Hermitian, is a real symmetric tridiagonal matrix    in which case all the elements are zero except for those on the main, sub and super diagonals.


  
Input: ( ${\bf A}, {\bf V}, {\bf H}$ ) with ${\bf A}{\bf V} =
{\bf V} {\bf H}, {\bf V}^H {\bf V} = {\bf I}$ , ${\bf H}$ upper Hessenberg;
For $j=1,2,3,...$ until $convergence$,
    (a1.1) Select a shift $ \mu \leftarrow \mu_j$
    (a1.2) Factor [ ${\bf Q},{\bf R}$] = qr( ${\bf H} - \mu {\bf I}$) ;
    (a1.3) ${\bf H} \leftarrow {\bf Q}^H {\bf H} {\bf Q}$ ; ${\bf V} \leftarrow {\bf V} {\bf Q} $;
End_For
Figure 4.2: Algorithm 1: Shifted QR-iteration.

The QR-iteration  is shown in Figure 4.2. The QR factorization  of is given by the unitary matrix and upper triangular It is easy to see that is unitarily similar to ${\bf A}$ throughout the course of this iteration. The iteration is continued until the subdiagonal elements of converge to zero, i.e. until a Schur decomposition has been (approximately) obtained. In the standard implicitly shifted QR-iteration, the unitary matrix is never actually formed. It is computed indirectly as a product of Givens or Householder transformations through a ``bulge chase" process.   The elegant details of an efficient and stable implementation would be too much of a digression here. They may be found in [16,28,48]. The convergence behavior of this iteration is fascinating. The columns of converge to Schur vectors at various rates. These rates are fundamentally linked to the simple power method and its rapidly convergent variant, inverse iteration. See [47] for further information and references.

Despite the extremely fast rate of convergence and the efficient use of storage, the implicitly shifted QR method   is not suitable for large scale problems and it has proven to be extremely difficult to parallelize. Large scale problems are typically sparse or structured so that a matrix-vector product ${\bf w}\leftarrow{\bf A}{\bf v}$ may be computed with time and storage proportional to n rather than n2. A method based upon full similarity transformations quickly destroys this structure. Storage and operation counts per iteration become order n2. Hence, there is considerable motivation for methods that only require matrix-vector products with the original

The power method  provides a simple means to find the dominant eigenvalue   (of largest magnitude) of a matrix without performing matrix factorizations and dense similarity transformations. It has the drawback that only one eigen-pair may be found and that convergence may be slow or non-existent. Deflation schemes and/or block variants may be employed to overcome the first problem and spectral transformations may be used to accelerate convergence and focus on selected eigenvalues. However, there is another very elegant way to extract additional eigenvalue information from the power method sequence.


next up previous contents index
Next: Krylov Subspaces and Projection Up: The Implicitly Restarted Arnoldi Previous: The Implicitly Restarted Arnoldi
Chao Yang
11/7/1997