The critical bottlenecks in the implementatin of the conjugate gradient algorithm on distributed memory computers are the communication requirements of the sparse matrix-vector multiply and of the vector recurrences. We describe the data distribution and communication patterns of five general implementations, whose realizations demonstrate that the cost of conmmunication can be overcome to a much large extent than is often assumed. Our results also apply to more general settings for matrix-vector products, both saprc and dense
|
|