SPIKE algorithm

The SPIKE algorithm is a hybrid parallel solver for banded linear systems developed by Eric Polizzi and Ahmed Sameh

Overview

The SPIKE algorithm deals with a linear system, where is a banded matrix of bandwidth much less than, and is an matrix containing right-hand sides. It is divided into a preprocessing stage and a postprocessing stage.

Preprocessing stage

In the preprocessing stage, the linear system is partitioned into a block tridiagonal form
Assume, for the time being, that the diagonal blocks are nonsingular. Define a block diagonal matrix
then is also nonsingular. Left-multiplying to both sides of the system gives
which is to be solved in the postprocessing stage. Left-multiplication by is equivalent to solving systems of the form
, which can be carried out in parallel.
Due to the banded nature of, only a few leftmost columns of each and a few rightmost columns of each can be nonzero. These columns are called the spikes.

Postprocessing stage

, assume that each spike contains exactly columns . Partition the spikes in all and into
where,, and are of dimensions. Partition similarly all and into
Notice that the system produced by the preprocessing stage can be reduced to a block pentadiagonal system of much smaller size
which we call the reduced system and denote by.
Once all and are found, all can be recovered with perfect parallelism via

SPIKE as a polyalgorithmic banded linear system solver

Despite being logically divided into two stages, computationally, the SPIKE algorithm comprises three stages:

factorizing the diagonal blocks,
computing the spikes,
solving the reduced system.

Each of these stages can be accomplished in several ways, allowing a multitude of variants. Two notable variants are the recursive SPIKE algorithm for non-diagonally-dominant cases and the truncated SPIKE algorithm for diagonally-dominant cases. Depending on the variant, a system can be solved either exactly or approximately. In the latter case, SPIKE is used as a preconditioner for iterative schemes like Krylov subspace methods and iterative refinement.

Recursive SPIKE

Preprocessing stage

The first step of the preprocessing stage is to factorize the diagonal blocks. For numerical stability, one can use LAPACK's XGBTRF routines to LU factorize them with partial pivoting. Alternatively, one can also factorize them without partial pivoting but with a "diagonal boosting" strategy. The latter method tackles the issue of singular diagonal blocks.
In concrete terms, the diagonal boosting strategy is as follows. Let denote a configurable "machine zero". In each step of LU factorization, we require that the pivot satisfy the condition
If the pivot does not satisfy the condition, it is then boosted by
where is a positive parameter depending on the machine's unit roundoff, and the factorization continues with the boosted pivot. This can be achieved by modified versions of ScaLAPACK's XDBTRF routines. After the diagonal blocks are factorized, the spikes are computed and passed on to the postprocessing stage.

Postprocessing stage

The two-partition case

In the two-partition case, i.e., when, the reduced system has the form
An even smaller system can be extracted from the center:
which can be solved using the block LU factorization
Once and are found, and can be computed via

The multiple-partition case

Assume that is a power of two, i.e.,. Consider a block diagonal matrix
where
for. Notice that essentially consists of diagonal blocks of order extracted from. Now we factorize as
The new matrix has the form
Its structure is very similar to that of, only differing in the number of spikes and their height. Thus, a similar factorization step can be performed on to produce
and
Such factorization steps can be performed recursively. After steps, we obtain the factorization
where has only two spikes. The reduced system will then be solved via
The block LU factorization technique in the two-partition case can be used to handle the solving steps involving,..., and for they essentially solve multiple independent systems of generalized two-partition forms.
Generalization to cases where is not a power of two is almost trivial.

Truncated SPIKE

When is diagonally-dominant, in the reduced system
the blocks and are often negligible. With them omitted, the reduced system becomes block diagonal
and can be easily solved in parallel.
The truncated SPIKE algorithm can be wrapped inside some outer iterative scheme to improve the accuracy of the solution.

SPIKE for tridiagonal systems

The first SPIKE partitioning and algorithm was presented in and was designed as the means to improve the stability properties of a parallel Givens rotations-based solver for tridiagonal systems. A version of the algorithm, termed g-Spike, that is based on serial Givens rotations applied independently on each block was designed for the NVIDIA GPU. A SPIKE-based algorithm for the GPU that is based on a special block diagonal pivoting strategy is described in.

SPIKE as a preconditioner

The SPIKE algorithm can also function as a preconditioner for iterative methods for solving linear systems. To solve a linear system using a SPIKE-preconditioned iterative solver, one extracts center bands from to form a banded preconditioner and solves linear systems involving in each iteration with the SPIKE algorithm.
In order for the preconditioner to be effective, row and/or column permutation is usually necessary to move "heavy" elements of close to the diagonal so that they are covered by the preconditioner. This can be accomplished by computing the weighted spectral reordering of.
The SPIKE algorithm can be generalized by not restricting the preconditioner to be strictly banded. In particular, the diagonal block in each partition can be a general matrix and thus handled by a direct general linear system solver rather than a banded solver. This enhances the preconditioner, and hence allows better chance of convergence and reduces the number of iterations.

Implementations

offers an implementation of the SPIKE algorithm under the name Intel Adaptive Spike-Based Solver. Tridiagonal solvers have also been developed for the NVIDIA GPU
and the Xeon Phi co-processors. The method in is the basis for a tridiagonal solver in the cuSPARSE library. The Givens rotations based solver was also implemented for the
GPU and the Intel Xeon Phi.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...