| SP Parallel Programming Workshop |
| m e s s a g e p a s s i n g i n t e r f a c e ( m p i ) |
| What Is MPI? |
|
| The Message Passing Paradigm |
|
Note: This section can be skipped if the reader is already familiar with message passing concepts.
|
|
|
It is not safe to modify or use the application buffer after completion of a non-blocking send. It is the programmer's responsibility to insure that the application buffer is free for reuse.
Non-blocking communications are primarily used to overlap computation with communication to effect performance gains.
Communicators and groups will be covered in more detail later. For now, simply use MPI_COMM_WORLD whenever a communicator is required - it is the predefined communicator which includes all of your MPI processes.
![]() |
Used by the programmer to specify the source and destination of messages. Often used conditionally by the application to control program execution (if rank=0 do this / if rank=1 do that).
| Getting Started |
|
| C include file | Fortran include file |
|---|---|
#include "mpi.h"
| include 'mpif.h'
| |
| C Binding | |
|---|---|
| Format: | |
| Example: | |
| Error code: | Returned as "rc". MPI_SUCCESS if successful |
| Fortran Binding | |
|---|---|
| Format: | call mpi_xxxxx(parameter,..., ierr) |
| Example: | |
| Error code: | Returned as "ierr" parameter. MPI_SUCCESS if successful |
| MPI include file |
| . . . |
| Initialize MPI environment |
| . . . |
| Do work and make message passing calls |
| . . . |
| Terminate MPI Environment |
| Environment Management Routines |
|
Several of the more commonly used MPI environment management routines are described below.
MPI_INIT (ierr)
MPI_COMM_SIZE (comm,size,ierr)
MPI_COMM_RANK (comm,rank,ierr)
MPI_ABORT (comm,errorcode,ierr)
MPI_GET_PROCESSOR_NAME (name,resultlength,ierr)
MPI_INITIALIZED (flag,ierr)
MPI_WTIME ()
MPI_WTICK ()
MPI_FINALIZE (ierr)
Examples: Environment Management Routines
|
|
#include "mpi.h"
#include <stdio.h>
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, rc;
rc = MPI_Init(&argc,&argv);
if (rc != 0) {
printf ("Error starting MPI program. Terminating.\n");
MPI_Abort(MPI_COMM_WORLD, rc);
}
MPI_Comm_size(MPI_COMM_WORLD,&numtasks);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
printf ("Number of tasks= %d My rank= %d\n", numtasks,rank);
/******* do some work *******/
MPI_Finalize();
}
|
|
|
program simple
include 'mpif.h'
integer numtasks, rank, ierr, rc
call MPI_INIT(ierr)
if (ierr .ne. 0) then
print *,'Error starting MPI program. Terminating.'
call MPI_ABORT(MPI_COMM_WORLD, rc, ierr)
end if
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
print *, 'Number of tasks=',numtasks,' My rank=',rank
C ****** do some work ******
call MPI_FINALIZE(ierr)
end
|
| Point to Point Communication Routines |
|
| MPI Message Passing Routine Arguments | |
MPI point-to-point communication routines generally have an argument list which takes one of the following formats:
| Blocking sends | |
| Non-blocking sends | |
| Blocking receive | |
| Non-blocking receive |
| MPI C data types | MPI Fortran data types | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MPI_CHAR
| signed char
| MPI_CHARACTER
| character(1)
| MPI_SHORT
| signed short int
|
|
| MPI_INT
| signed int
| MPI_INTEGER
| integer
| MPI_LONG
| signed long int
|
|
| MPI_UNSIGNED_CHAR
| unsigned char
|
|
| MPI_UNSIGNED_SHORT
| unsigned short int
|
|
| MPI_UNSIGNED
| unsigned int
|
|
| MPI_UNSIGNED_LONG
| unsigned long int
|
|
| MPI_FLOAT
| float
| MPI_REAL
| real
| MPI_DOUBLE
| double
| MPI_DOUBLE_PRECISION
| double precision
| MPI_LONG_DOUBLE
| long double
|
|
|
|
| MPI_COMPLEX
| complex
|
|
| MPI_LOGICAL
| logical
| MPI_BYTE
| 8 binary digits
| MPI_BYTE
| 8 binary digits
| MPI_PACKED
| data packed or unpacked with MPI_Pack()/
MPI_Unpack
| MPI_PACKED
| data packed or unpacked with MPI_Pack()/
MPI_Unpack
| |
| Point to Point Communication Routines |
|
| Blocking Message Passing Routines | |
The more commonly used MPI blocking message passing routines are described below.
MPI_SEND (buf,count,datatype,dest,tag,comm,ierr)
MPI_RECV (buf,count,datatype,source,tag,comm,status,ierr)
MPI_SSEND (buf,count,datatype,dest,tag,comm,ierr)
MPI_BSEND (buf,count,datatype,dest,tag,comm,ierr)
MPI_Buffer_detach (*buffer,size)
MPI_BUFFER_ATTACH (buffer,size,ierr)
MPI_BUFFER_DETACH (buffer,size,ierr)
MPI_RSEND (buf,count,datatype,dest,tag,comm,ierr)
......
*recvbuf,recvcount,recvtype,source,recvtag,
......
comm,*status)
MPI_SENDRECV (sendbuf,sendcount,sendtype,dest,sendtag,
......
recvbuf,recvcount,recvtype,source,recvtag,
......
comm,status,ierr)
MPI_PROBE (source,tag,comm,status,ierr)
Examples: Blocking Message Passing Routines
Task 0 pings task 1 and awaits return ping
|
|
#include "mpi.h"
#include <stdio.h>
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, dest, source, rc, tag=1;
char inmsg, outmsg='x';
MPI_Status Stat;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0) {
dest = 1;
source = 1;
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
}
else if (rank == 1) {
dest = 0;
source = 0;
rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat);
rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD);
}
MPI_Finalize();
}
|
|
|
program ping
include 'mpif.h'
integer numtasks, rank, dest, source, tag, ierr
integer stat(MPI_STATUS_SIZE)
character inmsg, outmsg
tag = 1
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
if (rank .eq. 0) then
dest = 1
source = 1
outmsg = 'x'
call MPI_SEND(outmsg, 1, MPI_CHARACTER, dest, tag,
& MPI_COMM_WORLD, ierr)
call MPI_RECV(inmsg, 1, MPI_CHARACTER, source, tag,
& MPI_COMM_WORLD, stat, ierr)
else if (rank .eq. 1) then
dest = 0
source = 0
call MPI_RECV(inmsg, 1, MPI_CHARACTER, source, tag,
& MPI_COMM_WORLD, stat, err)
call MPI_SEND(outmsg, 1, MPI_CHARACTER, dest, tag,
& MPI_COMM_WORLD, err)
endif
call MPI_FINALIZE(ierr)
end
|
| Point to Point Communication Routines |
|
| Non-Blocking Message Passing Routines | |
The more commonly used MPI non-blocking message passing routines are described below.
MPI_ISEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_IRECV (buf,count,datatype,source,tag,comm,request,ierr)
MPI_ISSEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_IBSEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_IRSEND (buf,count,datatype,dest,tag,comm,request,ierr)
MPI_Testany (count,*array_of_requests,*index,*flag,*status)
MPI_Testall (count,*array_of_requests,*flag,*array_of_statuses)
MPI_Testsome (incount,*array_of_requests,*outcount,
......
*array_of_offsets, *array_of_statuses)
MPI_TEST (request,flag,status,ierr)
MPI_TESTANY (count,array_of_requests,index,flag,status,ierr)
MPI_TESTALL (count,array_of_requests,flag,array_of_statuses,ierr)
MPI_TESTSOME (incount,array_of_requests,outcount,
......
array_of_offsets, array_of_statuses,ierr)
MPI_Waitany (count,*array_of_requests,*index,*status)
MPI_Waitall (count,*array_of_requests,*array_of_statuses)
MPI_Waitsome (incount,*array_of_requests,*outcount,
......
*array_of_offsets, *array_of_statuses)
MPI_WAIT (request,status,ierr)
MPI_WAITANY (count,array_of_requests,index,status,ierr)
MPI_WAITALL (count,array_of_requests,array_of_statuses,
......
ierr)
MPI_WAITSOME (incount,array_of_requests,outcount,
......
array_of_offsets, array_of_statuses,ierr)
MPI_IPROBE (source,tag,comm,flag,status,ierr)
Examples: Non-Blocking Message Passing Routines
Nearest neighbor exchange in ring topology
|
|
#include "mpi.h"
#include <stdio.h>
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, next, prev, buf[2], tag1=1, tag2=2;
MPI_Request reqs[4];
MPI_Status stats[4];
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
prev = rank-1;
next = rank+1;
if (rank == 0) prev = numtasks - 1;
if (rank == (numtasks - 1)) next = 0;
MPI_Irecv(&buf[0], 1, MPI_INT, prev, tag1, MPI_COMM_WORLD, &reqs[0]);
MPI_Irecv(&buf[1], 1, MPI_INT, next, tag2, MPI_COMM_WORLD, &reqs[1]);
MPI_Isend(&rank, 1, MPI_INT, prev, tag2, MPI_COMM_WORLD, &reqs[2]);
MPI_Isend(&rank, 1, MPI_INT, next, tag1, MPI_COMM_WORLD, &reqs[3]);
MPI_Waitall(4, reqs, stats);
MPI_Finalize();
}
|
|
|
program ringtopo
include 'mpif.h'
integer numtasks, rank, next, prev, buf(2), tag1, tag2, ierr
integer stats(MPI_STATUS_SIZE,4), reqs(4)
tag1 = 1
tag2 = 2
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
prev = rank - 1
next = rank + 1
if (rank .eq. 0) then
prev = numtasks - 1
endif
if (rank .eq. numtasks - 1) then
next = 0
endif
call MPI_IRECV(buf(1), 1, MPI_INTEGER, prev, tag1,
& MPI_COMM_WORLD, reqs(1), ierr)
call MPI_IRECV(buf(2), 1, MPI_INTEGER, next, tag2,
& MPI_COMM_WORLD, reqs(2), ierr)
call MPI_ISEND(rank, 1, MPI_INTEGER, prev, tag2,
& MPI_COMM_WORLD, reqs(3), ierr)
call MPI_ISEND(rank, 1, MPI_INTEGER, next, tag1,
& MPI_COMM_WORLD, reqs(4), ierr)
call MPI_WAITALL(4, reqs, stats, ierr);
call MPI_FINALIZE(ierr)
end
|
| Collective Communication Routines |
|
Collective Communication Routines
MPI_BARRIER (comm,ierr)
MPI_BCAST (buffer,count,datatype,root,comm,ierr)
......
recvcnt,recvtype,root,comm)
MPI_SCATTER (sendbuf,sendcnt,sendtype,recvbuf,
......
recvcnt,recvtype,root,comm,ierr)
......
recvcount,recvtype,root,comm)
MPI_GATHER (sendbuf,sendcnt,sendtype,recvbuf,
......
recvcount,recvtype,root,comm,ierr)
......
recvcount,recvtype,comm)
MPI_ALLGATHER (sendbuf,sendcount,sendtype,recvbuf,
......
recvcount,recvtype,comm,info)
MPI_REDUCE (sendbuf,recvbuf,count,datatype,op,root,comm,ierr)
The predefined MPI reduction operations appear below. Users can also define their own reduction functions by using the MPI_Op_create routine.
| MPI Reduction Operation | C data types | Fortran data types | |||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
MPI_MAX
| maximum
| integer, float
| integer, real, complex
| MPI_MIN
| minimum
| integer, float
| integer, real, complex
| MPI_SUM
| sum
| integer, float
| integer, real, complex
| MPI_PROD
| product
| integer, float
| integer, real, complex
| MPI_LAND
| logical AND
| integer
| logical
| MPI_BAND
| bit-wise AND
| integer, MPI_BYTE
| integer, MPI_BYTE
| MPI_LOR
| logical OR
| integer
| logical
| MPI_BOR
| bit-wise OR
| integer, MPI_BYTE
| integer, MPI_BYTE
| MPI_LXOR
| logical XOR
| integer
| logical
| MPI_BXOR
| bit-wise XOR
| integer, MPI_BYTE
| integer, MPI_BYTE
| MPI_MAXLOC
| max value and location
| float, double and long double
| real, complex,double precision
| MPI_MINLOC
| min value and location
| float, double and long double
| real, complex, double precision
| |
MPI_ALLREDUCE (sendbuf,recvbuf,count,datatype,op,comm,ierr)
......
op,comm)
MPI_REDUCE_SCATTER (sendbuf,recvbuf,recvcount,datatype,
......
op,comm,ierr)
......
recvcnt,recvtype,comm)
MPI_ALLTOALL (sendbuf,sendcount,sendtype,recvbuf,
......
recvcnt,recvtype,comm,ierr)
MPI_SCAN (sendbuf,recvbuf,count,datatype,op,comm,ierr)
Examples: Collective Communications
Perform a scatter operation on the rows of an array
|
|
#include "mpi.h"
#include <stdio.h>
#define SIZE 4
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, sendcount, recvcount, source;
float sendbuf[SIZE][SIZE] = {
{1.0, 2.0, 3.0, 4.0},
{5.0, 6.0, 7.0, 8.0},
{9.0, 10.0, 11.0, 12.0},
{13.0, 14.0, 15.0, 16.0} };
float recvbuf[SIZE];
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks == SIZE) {
source = 1;
sendcount = SIZE;
recvcount = SIZE;
MPI_Scatter(sendbuf,sendcount,MPI_FLOAT,recvbuf,recvcount,
MPI_FLOAT,source,MPI_COMM_WORLD);
printf("rank= %d Results: %f %f %f %f\n",rank,recvbuf[0],
recvbuf[1],recvbuf[2],recvbuf[3]);
}
else
printf("Must specify %d processors. Terminating.\n",SIZE);
MPI_Finalize();
}
|
|
|
program scatter
include 'mpif.h'
integer SIZE
parameter(SIZE=4)
integer numtasks, rank, sendcount, recvcount, source, ierr
real*4 sendbuf(SIZE,SIZE), recvbuf(SIZE)
C Fortran stores this array in column major order, so the
C scatter will actually scatter columns, not rows.
data sendbuf /1.0, 2.0, 3.0, 4.0,
& 5.0, 6.0, 7.0, 8.0,
& 9.0, 10.0, 11.0, 12.0,
& 13.0, 14.0, 15.0, 16.0 /
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
if (numtasks .eq. SIZE) then
source = 1
sendcount = SIZE
recvcount = SIZE
call MPI_SCATTER(sendbuf, sendcount, MPI_REAL, recvbuf,
& recvcount, MPI_REAL, source, MPI_COMM_WORLD, ierr)
print *, 'rank= ',rank,' Results: ',recvbuf
else
print *, 'Must specify',SIZE,' processors. Terminating.'
endif
call MPI_FINALIZE(ierr)
end
|
rank= 0 Results: 1.000000 2.000000 3.000000 4.000000 rank= 1 Results: 5.000000 6.000000 7.000000 8.000000 rank= 2 Results: 9.000000 10.000000 11.000000 12.000000 rank= 3 Results: 13.000000 14.000000 15.000000 16.000000
| Derived Data Types |
|
| MPI C data types | MPI Fortran data types | ||||||||||||||||||||||||||||
MPI_CHAR
| MPI_CHARACTER
| MPI_SHORT
|
| MPI_INT
| MPI_INTEGER
| MPI_LONG
|
| MPI_UNSIGNED_CHAR
|
| MPI_UNSIGNED_SHORT
|
| MPI_UNSIGNED
|
| MPI_UNSIGNED_LONG
|
| MPI_FLOAT
| MPI_REAL
| MPI_DOUBLE
| MPI_DOUBLE_PRECISION
| MPI_LONG_DOUBLE
|
|
| MPI_COMPLEX
|
| MPI_LOGICAL
| MPI_BYTE
| MPI_BYTE
| MPI_PACKED
| MPI_PACKED
| |
Derived Data Type Routines
MPI_TYPE_CONTIGUOUS (count,oldtype,newtype,ierr)
MPI_TYPE_VECTOR (count,blocklength,stride,oldtype,newtype,ierr)
MPI_TYPE_INDEXED (count,blocklens(),offsets(),old_type,newtype,ierr)
MPI_TYPE_STRUCT (count,blocklens(),offsets(),old_types,newtype,ierr)
MPI_TYPE_EXTENT (datatype,extent,ierr)
MPI_TYPE_COMMIT (datatype,ierr)
Examples: Contiguous Derived Data Type
Create a data type representing a row of an array and distribute a different row to all processes. Diagram here.
|
|
#include "mpi.h"
#include <stdio.h>
#define SIZE 4
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, source=0, dest, tag=1, i;
float a[SIZE][SIZE] =
{1.0, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0,
9.0, 10.0, 11.0, 12.0,
13.0, 14.0, 15.0, 16.0};
float b[SIZE];
MPI_Status stat;
MPI_Datatype rowtype;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Type_contiguous(SIZE, MPI_FLOAT, &rowtype);
MPI_Type_commit(&rowtype);
if (numtasks == SIZE) {
if (rank == 0) {
for (i=0; i<numtasks; i++)
MPI_Send(&a[i][0], 1, rowtype, i, tag, MPI_COMM_WORLD);
}
MPI_Recv(b, SIZE, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
printf("rank= %d b= %3.1f %3.1f %3.1f %3.1f\n",
rank,b[0],b[1],b[2],b[3]);
}
else
printf("Must specify %d processors. Terminating.\n",SIZE);
MPI_Finalize();
}
|
|
|
program contiguous
include 'mpif.h'
integer SIZE
parameter(SIZE=4)
integer numtasks, rank, source, dest, tag, i, ierr
real*4 a(0:SIZE-1,0:SIZE-1), b(0:SIZE-1)
integer stat(MPI_STATUS_SIZE), columntype
C Fortran stores this array in column major order
data a /1.0, 2.0, 3.0, 4.0,
& 5.0, 6.0, 7.0, 8.0,
& 9.0, 10.0, 11.0, 12.0,
& 13.0, 14.0, 15.0, 16.0 /
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
call MPI_TYPE_CONTIGUOUS(SIZE, MPI_REAL, columntype, ierr)
call MPI_TYPE_COMMIT(columntype, ierr)
tag = 1
if (numtasks .eq. SIZE) then
if (rank .eq. 0) then
do 10 i=0, numtasks-1
call MPI_SEND(a(0,i), 1, columntype, i, tag,
& MPI_COMM_WORLD,ierr)
10 continue
endif
source = 0
call MPI_RECV(b, SIZE, MPI_REAL, source, tag,
& MPI_COMM_WORLD, stat, ierr)
print *, 'rank= ',rank,' b= ',b
else
print *, 'Must specify',SIZE,' processors. Terminating.'
endif
call MPI_FINALIZE(ierr)
end
|
rank= 0 b= 1.0 2.0 3.0 4.0 rank= 1 b= 5.0 6.0 7.0 8.0 rank= 2 b= 9.0 10.0 11.0 12.0 rank= 3 b= 13.0 14.0 15.0 16.0
Examples: Vector Derived Data Type
Create a data type representing a column of an array and distribute different columns to all processes. Diagram here.
|
|
#include "mpi.h"
#include <stdio.h>
#define SIZE 4
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, source=0, dest, tag=1, i;
float a[SIZE][SIZE] =
{1.0, 2.0, 3.0, 4.0,
5.0, 6.0, 7.0, 8.0,
9.0, 10.0, 11.0, 12.0,
13.0, 14.0, 15.0, 16.0};
float b[SIZE];
MPI_Status stat;
MPI_Datatype columntype;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
MPI_Type_vector(SIZE, 1, SIZE, MPI_FLOAT, &columntype);
MPI_Type_commit(&columntype);
if (numtasks == SIZE) {
if (rank == 0) {
for (i=0; i<numtasks; i++)
MPI_Send(&a[0][i], 1, columntype, i, tag, MPI_COMM_WORLD);
}
MPI_Recv(b, SIZE, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
printf("rank= %d b= %3.1f %3.1f %3.1f %3.1f\n",
rank,b[0],b[1],b[2],b[3]);
}
else
printf("Must specify %d processors. Terminating.\n",SIZE);
MPI_Finalize();
}
|
|
|
program vector
include 'mpif.h'
integer SIZE
parameter(SIZE=4)
integer numtasks, rank, source, dest, tag, i, ierr
real*4 a(0:SIZE-1,0:SIZE-1), b(0:SIZE-1)
integer stat(MPI_STATUS_SIZE), rowtype
C Fortran stores this array in column major order
data a /1.0, 2.0, 3.0, 4.0,
& 5.0, 6.0, 7.0, 8.0,
& 9.0, 10.0, 11.0, 12.0,
& 13.0, 14.0, 15.0, 16.0 /
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
call MPI_TYPE_VECTOR(SIZE, 1, SIZE, MPI_REAL, rowtype, ierr)
call MPI_TYPE_COMMIT(rowtype, ierr)
tag = 1
if (numtasks .eq. SIZE) then
if (rank .eq. 0) then
do 10 i=0, numtasks-1
call MPI_SEND(a(i,0), 1, rowtype, i, tag,
& MPI_COMM_WORLD, ierr)
10 continue
endif
source = 0
call MPI_RECV(b, SIZE, MPI_REAL, source, tag,
& MPI_COMM_WORLD, stat, ierr)
print *, 'rank= ',rank,' b= ',b
else
print *, 'Must specify',SIZE,' processors. Terminating.'
endif
call MPI_FINALIZE(ierr)
end
|
rank= 0 b= 1.0 5.0 9.0 13.0 rank= 1 b= 2.0 6.0 10.0 14.0 rank= 2 b= 3.0 7.0 11.0 15.0 rank= 3 b= 4.0 8.0 12.0 16.0
Examples: Indexed Derived Data Type
Create a datatype by extracting variable portions of an array and distribute to all tasks. Diagram here.
|
|
#include "mpi.h"
#include <stdio.h>
#define NELEMENTS 6
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, source=0, dest, tag=1, i;
int blocklengths[2], displacements[2];
float a[16] =
{1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0,
9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
float b[NELEMENTS];
MPI_Status stat;
MPI_Datatype indextype;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
blocklengths[0] = 4;
blocklengths[1] = 2;
displacements[0] = 5;
displacements[1] = 12;
MPI_Type_indexed(2, blocklengths, displacements, MPI_FLOAT, &indextype);
MPI_Type_commit(&indextype);
if (rank == 0) {
for (i=0; i<numtasks; i++)
MPI_Send(a, 1, indextype, i, tag, MPI_COMM_WORLD);
}
MPI_Recv(b, NELEMENTS, MPI_FLOAT, source, tag, MPI_COMM_WORLD, &stat);
printf("rank= %d b= %3.1f %3.1f %3.1f %3.1f %3.1f %3.1f\n",
rank,b[0],b[1],b[2],b[3],b[4],b[5]);
MPI_Finalize();
}
|
|
|
program indexed
include 'mpif.h'
integer NELEMENTS
parameter(NELEMENTS=6)
integer numtasks, rank, source, dest, tag, i, ierr
integer blocklengths(0:1), displacements(0:1)
real*4 a(0:15), b(0:NELEMENTS-1)
integer stat(MPI_STATUS_SIZE), indextype
data a /1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0,
& 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0 /
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
blocklengths(0) = 4
blocklengths(1) = 2
displacements(0) = 5
displacements(1) = 12
call MPI_TYPE_INDEXED(2, blocklengths, displacements, MPI_REAL,
& indextype, ierr)
call MPI_TYPE_COMMIT(indextype, ierr)
tag = 1
if (rank .eq. 0) then
do 10 i=0, numtasks-1
call MPI_SEND(a, 1, indextype, i, tag, MPI_COMM_WORLD, ierr)
10 continue
endif
source = 0
call MPI_RECV(b, NELEMENTS, MPI_REAL, source, tag, MPI_COMM_WORLD,
& stat, ierr)
print *, 'rank= ',rank,' b= ',b
call MPI_FINALIZE(ierr)
end
|
rank= 0 b= 6.0 7.0 8.0 9.0 13.0 14.0 rank= 1 b= 6.0 7.0 8.0 9.0 13.0 14.0 rank= 2 b= 6.0 7.0 8.0 9.0 13.0 14.0 rank= 3 b= 6.0 7.0 8.0 9.0 13.0 14.0
Examples: Struct Derived Data Type
Create a data type which represents a particle and distribute an array of such particles to all processes. Diagram here.
|
|
#include "mpi.h"
#include <stdio.h>
#define NELEM 25
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, source=0, dest, tag=1, i;
typedef struct {
float x, y, z;
float velocity;
int n, type;
} Particle;
Particle p[NELEM], particles[NELEM];
MPI_Datatype particletype, oldtypes[2];
int blockcounts[2];
/* MPI_Aint type used to be consistent with syntax of */
/* MPI_Type_extent routine */
MPI_Aint offsets[2], extent;
MPI_Status stat;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
/* Setup description of the 4 MPI_FLOAT fields x, y, z, velocity */
offsets[0] = 0;
oldtypes[0] = MPI_FLOAT;
blockcounts[0] = 4;
/* Setup description of the 2 MPI_INT fields n, type */
/* Need to first figure offset by getting size of MPI_FLOAT */
MPI_Type_extent(MPI_FLOAT, &extent);
offsets[1] = 4 * extent;
oldtypes[1] = MPI_INT;
blockcounts[1] = 2;
/* Now define structured type and commit it */
MPI_Type_struct(2, blockcounts, offsets, oldtypes, &particletype);
MPI_Type_commit(&particletype);
/* Initialize the particle array and then send it to each task */
if (rank == 0) {
for (i=0; i<NELEM; i++) {
particles[i].x = i * 1.0;
particles[i].y = i * -1.0;
particles[i].z = i * 1.0;
particles[i].velocity = 0.25;
particles[i].n = i;
particles[i].type = i % 2;
}
for (i=0; i<numtasks; i++)
MPI_Send(particles, NELEM, particletype, i, tag, MPI_COMM_WORLD);
}
MPI_Recv(p, NELEM, particletype, source, tag, MPI_COMM_WORLD, &stat);
/* Print a sample of what was received */
printf("rank= %d %3.2f %3.2f %3.2f %3.2f %d %d\n", rank,p[3].x,
p[3].y,p[3].z,p[3].velocity,p[3].n,p[3].type);
MPI_Finalize();
}
|
|
|
program struct
include 'mpif.h'
integer NELEM
parameter(NELEM=25)
integer numtasks, rank, source, dest, tag, i, ierr
integer stat(MPI_STATUS_SIZE)
type Particle
sequence
real*4 x, y, z, velocity
integer n, type
end type Particle
type (Particle) p(NELEM), particles(NELEM)
integer particletype, oldtypes(0:1), blockcounts(0:1),
& offsets(0:1), extent
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
C Setup description of the 4 MPI_REAL fields x, y, z, velocity
offsets(0) = 0
oldtypes(0) = MPI_REAL
blockcounts(0) = 4
C Setup description of the 2 MPI_INTEGER fields n, type
C Need to first figure offset by getting size of MPI_REAL
call MPI_TYPE_EXTENT(MPI_REAL, extent, ierr)
offsets(1) = 4 * extent
oldtypes(1) = MPI_INTEGER
blockcounts(1) = 2
C Now define structured type and commit it
call MPI_TYPE_STRUCT(2, blockcounts, offsets, oldtypes,
& particletype, ierr)
call MPI_TYPE_COMMIT(particletype, ierr)
C Initialize the particle array and then send it to each task
tag = 1
if (rank .eq. 0) then
do 10 i=0, NELEM-1
particles(i) = Particle ( 1.0*i, -1.0*i, 1.0*i,
& 0.25, i, mod(i,2) )
10 continue
do 20 i=0, numtasks-1
call MPI_SEND(particles, NELEM, particletype, i, tag,
& MPI_COMM_WORLD, ierr)
20 continue
endif
source = 0
call MPI_RECV(p, NELEM, particletype, source, tag,
& MPI_COMM_WORLD, stat, ierr)
print *, 'rank= ',rank,' p(3)= ',p(3)
call MPI_FINALIZE(ierr)
end
|
rank= 0 3.00 -3.00 3.00 0.25 3 1 rank= 2 3.00 -3.00 3.00 0.25 3 1 rank= 1 3.00 -3.00 3.00 0.25 3 1 rank= 3 3.00 -3.00 3.00 0.25 3 1
| Group and Communicator Management Routines |
|
|
Group and Communicator Management Routines
MPI_COMM_GROUP (comm,group,ierr)
MPI_GROUP_RANK (group,rank,ierr)
MPI_GROUP_SIZE (group,size,ierr)
MPI_GROUP_EXCL (group,n,ranks,newgroup,ierr)
MPI_GROUP_INCL (group,n,ranks,newgroup,ierr)
MPI_GROUP_INTERSECTION (group1,group2,newgroup,ierr)
MPI_GROUP_UNION (group1,group2,newgroup,ierr)
MPI_GROUP_DIFFERENCE (group1,group2,newgroup,ierr)
MPI_GROUP_COMPARE (group1,group2,result,ierr)
MPI_GROUP_FREE (group,ierr)
MPI_COMM_CREATE (comm,group,newcomm,ierr)
MPI_COMM_DUP (comm,newcomm,ierr)
MPI_COMM_COMPARE (comm1,comm2,result,ierr)
MPI_COMM_FREE (comm,ierr)
Examples: Group and Communicator Routines
Create two different process groups for separate collective communications exchange. Requires creating new communicators also.
|
|
#include "mpi.h"
#include <stdio.h>
#define NPROCS 8
int main(argc,argv)
int argc;
char *argv[]; {
int rank, new_rank, sendbuf, recvbuf, numtasks,
ranks1[4]={0,1,2,3}, ranks2[4]={4,5,6,7};
MPI_Group orig_group, new_group;
MPI_Comm new_comm;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks != NPROCS) {
printf("Must specify MP_PROCS= %d. Terminating.\n",NPROCS);
MPI_Finalize();
exit(0);
}
sendbuf = rank;
/* Extract the original group handle */
MPI_Comm_group(MPI_COMM_WORLD, &orig_group);
/* Divide tasks into two distinct groups based upon rank */
if (rank < NPROCS/2) {
MPI_Group_incl(orig_group, NPROCS/2, ranks1, &new_group);
}
else {
MPI_Group_incl(orig_group, NPROCS/2, ranks2, &new_group);
}
/* Create new new communicator and then perform collective communications */
MPI_Comm_create(MPI_COMM_WORLD, new_group, &new_comm);
MPI_Allreduce(&sendbuf, &recvbuf, 1, MPI_INT, MPI_SUM, new_comm);
MPI_Group_rank (new_group, &new_rank);
printf("rank= %d newrank= %d recvbuf= %d\n",rank,new_rank,recvbuf);
MPI_Finalize();
}
|
|
|
program group
include 'mpif.h'
integer NPROCS
parameter(NPROCS=8)
integer rank, new_rank, sendbuf, recvbuf, numtasks
integer ranks1(4), ranks2(4), ierr
integer orig_group, new_group, new_comm
data ranks1 /0, 1, 2, 3/, ranks2 /4, 5, 6, 7/
call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
if (numtasks .ne. NPROCS) then
print *, 'Must specify MPROCS= ',NPROCS,' Terminating.'
call MPI_FINALIZE(ierr)
stop
endif
sendbuf = rank
C Extract the original group handle
call MPI_COMM_GROUP(MPI_COMM_WORLD, orig_group, ierr)
C Divide tasks into two distinct groups based upon rank
if (rank .lt. NPROCS/2) then
call MPI_GROUP_INCL(orig_group, NPROCS/2, ranks1,
& new_group, ierr)
else
call MPI_GROUP_INCL(orig_group, NPROCS/2, ranks2,
& new_group, ierr)
endif
call MPI_COMM_CREATE(MPI_COMM_WORLD, new_group,
& new_comm, ierr)
call MPI_ALLREDUCE(sendbuf, recvbuf, 1, MPI_INTEGER,
& MPI_SUM, new_comm, ierr)
call MPI_GROUP_RANK(new_group, new_rank, ierr)
print *, 'rank= ',rank,' newrank= ',new_rank,' recvbuf= ',
& recvbuf
call MPI_FINALIZE(ierr)
end
|
rank= 7 newrank= 3 recvbuf= 22 rank= 0 newrank= 0 recvbuf= 6 rank= 1 newrank= 1 recvbuf= 6 rank= 2 newrank= 2 recvbuf= 6 rank= 6 newrank= 2 recvbuf= 22 rank= 3 newrank= 3 recvbuf= 6 rank= 4 newrank= 0 recvbuf= 22 rank= 5 newrank= 1 recvbuf= 22
| Virtual Topologies |
|
|
Virtual Topology Routines
MPI_CART_COORDS (comm,rank,maxdims,coords(),ierr)
......
reorder,*comm_cart)
MPI_CART_CREATE (comm_old,ndims,dims(),periods,
......
reorder,comm_cart,ierr)
MPI_CART_GET (comm,maxdims,dims,periods,coords(),ierr)
MPI_CART_MAP (comm_old,ndims,dims(),periods(),newrank,ierr)
MPI_CART_RANK (comm,coords(),rank,ierr)
MPI_CART_SHIFT (comm,direction,displ,source,dest,ierr)
MPI_CART_SUB (comm,remain_dims(),comm_new,ierr)
MPI_CARTDIM_GET (comm,ndims,ierr)
MPI_DIMS_CREATE (nnodes,ndims,dims(),ierr)
......
reorder,*comm_graph)
MPI_GRAPH_CREATE (comm_old,nnodes,index(),edges(),
......
reorder,comm_graph,ierr)
MPI_GRAPH_GET (comm,maxindex,maxedges,index(),edges(),ierr)
MPI_GRAPH_MAP (comm_old,nnodes,index(),edges(),newrank,ierr)
MPI_GRAPH_NEIGHBORS (comm,rank,maxneighbors,neighbors(),ierr)
MPI_GRAPHDIMS_GET (comm,nnodes,nedges,ierr)
MPI_TOPO_TEST (comm,top_type,ierr)
Examples: Cartesian Virtual Topology Example
Create a 4 x 4 Cartesian topology from 16 processors and have each process exchange its rank with four neighbors.
|
|
#include "mpi.h"
#include <stdio.h>
#define SIZE 16
#define UP 0
#define DOWN 1
#define LEFT 2
#define RIGHT 3
int main(argc,argv)
int argc;
char *argv[]; {
int numtasks, rank, source, dest, outbuf, i, tag=1,
inbuf[4]={MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,},
nbrs[4], dims[2]={4,4},
periods[2]={0,0}, reorder=0, coords[2];
MPI_Request reqs[8];
MPI_Status stats[8];
MPI_Comm cartcomm;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
if (numtasks == SIZE) {
MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, reorder, &cartcomm);
MPI_Comm_rank(cartcomm, &rank);
MPI_Cart_coords(cartcomm, rank, 2, coords);
MPI_Cart_shift(cartcomm, 0, 1, &nbrs[UP], &nbrs[DOWN]);
MPI_Cart_shift(cartcomm, 1, 1, &nbrs[LEFT], &nbrs[RIGHT]);
outbuf = rank;
for (i=0; i<4; i++) {
dest = nbrs[i];
source = nbrs[i];
MPI_Isend(&outbuf, 1, MPI_INT, dest, tag,
MPI_COMM_WORLD, &reqs[i]);
MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag,
MPI_COMM_WORLD, &reqs[i+4]);
}
MPI_Waitall(8, reqs, stats);
printf("rank= %d coords= %d %d neighbors(u,d,l,r)= %d %d %d %d\n",
rank,coords[0],coords[1],nbrs[UP],nbrs[DOWN],nbrs[LEFT],
nbrs[RIGHT]);
printf("rank= %d inbuf(u,d,l,r)= %d %d %d %d\n",
rank,inbuf[UP],inbuf[DOWN],inbuf[LEFT],inbuf[RIGHT]);
}
else
printf("Must specify %d processors. Terminating.\n",SIZE);
MPI_Finalize();
}
|
|
|
program cartesian
include 'mpif.h'
integer SIZE, UP, DOWN, LEFT, RIGHT
parameter(SIZE=16)
parameter(UP=1)
parameter(DOWN=2)
parameter(LEFT=3)
parameter(RIGHT=4)
integer numtasks, rank, source, dest, outbuf, i, tag, ierr,
& inbuf(4), nbrs(4), dims(2), coords(2),
& stats(MPI_STATUS_SIZE, 8), reqs(8), cartcomm,
& periods(2), reorder
data inbuf /MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,
& MPI_PROC_NULL/, dims /4,4/, tag /1/,
& periods /0,0/, reorder /0/
call MPI_INIT(ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numtasks, ierr)
if (numtasks .eq. SIZE) then
call MPI_CART_CREATE(MPI_COMM_WORLD, 2, dims, periods, reorder,
& cartcomm, ierr)
call MPI_COMM_RANK(cartcomm, rank, ierr)
call MPI_CART_COORDS(cartcomm, rank, 2, coords, ierr)
print *,'rank= ',rank,'coords= ',coords
call MPI_CART_SHIFT(cartcomm, 0, 1, nbrs(UP), nbrs(DOWN), ierr)
call MPI_CART_SHIFT(cartcomm, 1, 1, nbrs(LEFT), nbrs(RIGHT),
& ierr)
outbuf = rank
do i=1,4
dest = nbrs(i)
source = nbrs(i)
call MPI_ISEND(outbuf, 1, MPI_INTEGER, dest, tag,
& MPI_COMM_WORLD, reqs(i), ierr)
call MPI_IRECV(inbuf(i), 1, MPI_INTEGER, source, tag,
& MPI_COMM_WORLD, reqs(i+4), ierr)
enddo
call MPI_WAITALL(8, reqs, stats, ierr)
print *,'rank= ',rank,' coords= ',coords,
& ' neighbors(u,d,l,r)= ',nbrs
print *,'rank= ',rank,' ',
& ' inbuf(u,d,l,r)= ',inbuf
else
print *, 'Must specify',SIZE,' processors. Terminating.'
endif
call MPI_FINALIZE(ierr)
end
|
rank= 0 coords= 0 0 neighbors(u,d,l,r)= -3 4 -3 1
rank= 0 inbuf(u,d,l,r)= -3 4 -3 1
rank= 1 coords= 0 1 neighbors(u,d,l,r)= -3 5 0 2
rank= 1 inbuf(u,d,l,r)= -3 5 0 2
rank= 2 coords= 0 2 neighbors(u,d,l,r)= -3 6 1 3
rank= 2 inbuf(u,d,l,r)= -3 6 1 3
. . . . .
rank= 14 coords= 3 2 neighbors(u,d,l,r)= 10 -3 13 15
rank= 14 inbuf(u,d,l,r)= 10 -3 13 15
rank= 15 coords= 3 3 neighbors(u,d,l,r)= 11 -3 14 -3
rank= 15 inbuf(u,d,l,r)= 11 -3 14 -3
| A Look Into The Future: MPI-2 |
|
These enhancements have now become part of the MPI-2 standard.
| References and More Information |
|
www.tc.cornell.edu/Edu/Talks/MPI/toc.html
ftp://math.usfca.edu/pub/MPI/mpi.guide.ps
www.mcs.anl.gov/mpi
www.tc.cornell.edu/Edu
www.mhpcc.edu/doc/mpi/mpi.html
WWW.ERC.MsState.Edu/mpi
www.epm.ornl.gov/~walker/mpi
www.osc.edu/lam.html#MPI
| Appendix A: MPI Routine Index |
|
The following table provides a comprehensive list of all MPI routines, each linked to its corresponding man page. Man pages were derived from the MPICH implementation of MPI and may differ from man pages of other implementations.