ABSTRACT

In this paper we present an efficient dense matrix multi- plication algorithm for distributed memory computers with a hypercube topology. The proposed algorithm performs better than all previously proposed algorithms for a wide range of matrix sizes and number of processors, especially for large matrices. We analyze the performance of the al- gorithms for two types of hypercube architectures, one in which each node can use (to send and receive) at most one communication link at a time and the other in which each node can use all communication links simultaneously. Keywords Matrix multiplication, distributed algorithms, interprocessor communication, hypercubes, 3-D grids.