MPI Communications Benchmarks

This page contains the results of a series of MPI communications benchmarks that are regularly run on the MHPCC 512-CPU Linux Cluster and 768-CPU IBM SP3. The benchmarks test the communication performance of the machine with 5 commonly used MPI functions:

The benchmarks repeatedly call the above MPI routines on varying size messages (1024 bytes, 2048 bytes, 4096 bytes, ... 4 mbytes) on varying size processor partitions (2, 4, 8, ..., 256 processors). Each MPI call is performed 20 times with each message size on each processor partition and the maximum communication bandwidth is plotted in the graphs below. The source code for the benchmarks are:

Maximum tasks per node: these results display communication bandwidth when both CPUs on the Linux cluster nodes, and all 16 CPUs on the IBM nodes are utilized: 1 task per node: only a single task is scheduled on each node in both the Linux Cluster and IBM SP3. The numbers are unrealistic from a practical standpoint; we don't encourage our users to use the machine in such an inefficient manner. However the results might better evaluate the underying network since each task has dedicated use of the network adaptor on that node (in the above tests, many tasks shared the single network adaptor): Disclaimer: it's likely that the numbers reported here are inaccurate. Some of the reported bandwidth results exceed the capability of the underlying switch by a factor of 2 or more. These results need to be verified.