Message Passing Overview
Note: This tutorial is no longer being maintained. Some information
contained in it may be out of date and/or no longer valid.
Hyperlinks may also be invalid.
Table of Contents
-
Message Passing Model
-
Message Passing Libraries
-
Terminology
-
Point-to-Point Communication
-
Collective Communication
-
Performance Guidelines
-
Library Comparison
-
SP2 Performance Results
-
Recommendations
-
Acknowledgements and References
The message passing model is one of several computational models for
conceptualizing program operations. The message passing model is defined
as:
- set of processes having only local memory
- processes communicate by sending and receiving messages
- the transfer of data between processes requires
cooperative operations
to be performed by each process (a send operation must
have a matching receive)
Other models include:
- data parallelism
data partitioning determines parallelism
- shared memory
multiple processes sharing common memory space
- remote memory operation
set of processes in which a process can access the memory of
another process without its participation
- threads
a single process having multiple (concurrent) execution paths
- combined models composed of two or more of the above
Note: these models are machine/architecture independent; any of the
models can be implemented on any hardware given appropriate operating
system support. An effective implementation is one which closely
matches its target hardware.
Message Passing Model (Cont.)
The message passing model has gained wide use in the field of
parallel computing due to advantages that include:
- Hardware match - The message passing model fits well on
parallel supercomputers and clusters of workstations which are
composed of separate processors connected by a communications network.
- Functionality - Message passing offers a full set of functions
for expressing parallel algorithms, providing the control not found in
data-parallel and compiler-based models.
- Performance - Effective use of modern CPUs requires management
of their memory hierarchy, especially their caches. Message passing
achieves this by giving programmer explicit control of data locality.
The principle drawback of message passing is the responsiblity it places
on the programmer. The programmer must explicitly implement a data
distribution scheme and all interprocess communication and synchronization.
In so doing, it is the programmer's responsibility to resolve data dependencies
and avoid deadlock and race conditions.
The set of communication operations that are allowed by an implementation
of the message passing model form the components of a message passing
library. Examples of message passing libraries include public domain
packages that do not target a specific machine (PICL, PVM, PARMACS, P4,
MPI, etc.) as well as machine dependent vendor implementations (MPL,
NX, CMMD, etc.).
The common components of message passing libraries include:
- process management routines
(initialize and finalize processes, determine number of processes
and process identifiers)
- point-to-point communication routines
(basic sends and receives between any two processes)
- process group/collective communication routines
(broadcast/gather/scatter operations amongst a set of processes,
synchronization of processes)
Until recently, users of message passing libraries had to choose between
using public domain packages for improved code portability, and vendor
implementations for improved performance on a given machine. The MPI (Message
Passing Interface) Library has been developed to meet the dual goals of
portability and performance on a wide range of machines.
The terminology used by the various message passing libraries has not
been standardized. Occasionally, communications routines from different
libraries having the same semantics are described in conflicting terms.
(Example: pvm_send() from the PVM library and csend() from the NX library.)
As a promotion of standardization, the terminology used by the MPI standard
is presented here.
-
Buffering
-
Temporary copying of messages performed by system as a part of its
transmission protocols. Copying occurs between user buffer space
(defined by process) and system buffer space (defined by library).
-
Blocking Communication
-
A communication routine is blocking if the completion of the call is
dependent on certain "events". For sends, the data must be
successfully sent or safely copied so that the buffer that contained the
data is available for reuse. For receives, the data must be safely stored
in the receive buffer so that it is ready for use.
-
Nonblocking Communication
-
A communication routine is nonblocking if the call returns without
waiting for any communications events to complete (such as copying
of message from user memory to system memory or arrival of message).
-
Synchronous Communication
-
Communication in which the sender does not return until the matching
receive has been posted on the destination process.
-
Asynchronous Communication
-
Communication in which the sender and receiver place no constraints
on each other in terms of completion.
Note: MPI uses nonblocking routines to provide this capability.
The basic components of any message passing library are its point-to-point
communications routines for data transfer between two processes, the
send and receive operations.
The basic send and receive operations look like:
send(address, length, destination, tag)
receive(address, max_length, source, tag, rec_length)
where
- address = memory location of the beginning of the buffer containing
the data to be sent or received
- length = length in bytes of message being sent
- max_length = length of the buffer into which received data is placed
- rec_length = number of bytes of data actually received
- destination = identifier of the process to which message is sent
- source is handled in one of the following ways by a given message
passing library
- as an output argument indicating where the message originated
- as an input argument specifying where the message is to originate,
messages not originating at the specified source must be queued
- tag = arbitrary (user defined) nonegative integer restricting receipt
of messages
Typical message passing libraries subdivide the basic sends and receives
into two types:
Point-to-Point Communication (Cont.)
Blocking Routines
- Send - completes when send buffer ready for reuse,
depends on system buffering:
- no system buffering - returns after message received
- system buffers messages - returns after message safely copied
- Receive - completes when receive buffer can be safely used
Blocking routines are the simplest but can be "unsafe":
Process 0 Process 1
--------- ---------
bsend(1) bsend(0)
brecv(1) brecv(0)
Completion depends on size of message and amount of system buffering. If
the message size exceeds the system buffer space, the sends may not complete
and the receives will not be reached. This situation is called deadlock,
where two or more processes cannot proceed with computation because they
depend on each other for a result they cannot get.
Note: Unsafe programs should be viewed as incorrect and steps
taken to insure program correctness.
Point-to-Point Communication (Cont.)
Possible solutions to unsafe programs
- reorder operations
Process 0 Process 1
--------- ---------
bsend(1) brecv(0)
brecv(1) bsend(0)
- if library contains it, use routine that supplies receive buffer
at same time as send, e.g. sendrecv()
Process 0 Process 1
--------- ---------
sendrecv(1) sendrecv(0)
- use nonblocking operations (discussed below)
Process 0 Process 1
--------- ---------
nbsend(1) nbsend(0)
nbrecv(1) nbrecv(0)
waitall waitall
Point-to-Point Communication (Cont.)
Nonblocking sends and receives:
- return immediately
- allow overlapping of communication and computation,
buffers must not be used until communication completes
- wait routines insure communication has completed
- usually have an argument to return a message identifier
that is used with the wait routines
- nbsend(address, length, destination, tag, msg_id)
- nbrecv(address, max_length, source, tag, msg_id)
- wait(msg_id)
Point-to-Point Communication (Cont.)
Some libraries have additional send operations, such as
- Synchronous send - the send does not complete until a matching
receive has been posted
- Buffered send - the user supples a buffer to the system for its
use, allows user to supply enough memory to insure an unsafe
program becomes safe
These may be available in both blocking and nonblocking forms.
The synchronous and buffered send routines can have a negative impact on
performance and should generally be avoided:
- Synchronous send - may cause added wait/idle time for sending
process, will cause unsafe programs to become incorrect since
matching recieves will not be reached
- Buffered send - forces buffering, message copying can decrease
bandwidth between processes
The basic point-to-point communication routines in many libraries are designed
to handle data stored contiguously in memory. In this case, there are
usually additional routines to handle non-contiguous data.
Collective operations are coordinated among a "group" of processes
- library routines are used to construct a group from a subset
of processes
- collective operations are blocking
- three types of collective operations
- synchronization - processes wait until all members of the group
have reached the synchronization point
- data movement
- collective computation (reductions) - one member of the
group collects data from the other members and performs
and operation (min, max, add, multiply, etc.) on that data
There are some basic steps that can be taken improve the performance of
programs written using a message passing library.
- Start with tuned serial program
Often, the most substantial performance gains can be achieved
by tuning the single node operation of a program.
- Control process granularity
Increased granularity, the ratio of the amount of computation to the
amount of communication, may reduce communications cost and improve
performance.
- Overlap communication and computation
Use of nonblocking communication should be used when it can to hide
the communication costs by permitting simultaneous computation.
- Avoid unneccessary synchronization
Synchronizing processes may induce idle time as some processes wait
for others to catch up.
- Avoid buffering when possible
Extra copying of messages decreases bandwidth.
- Keep data local
When possible, data should be aligned with processes to decrease
communication.
The three main criteria for choosing a message passing library are:
- performance - latency and bandwidth
- portability
- functionality
Latency (time to transmit 0 length message) and bandwidth vary greatly from
machine to machine. The best performance on a specific machine is
typically obtained from the native message passing library written
specifically for that machine.
Measured latency and bandwidth for the MPI, MPL, PVMe and PVM libraries on
the SP2.
| Message Passing Library
| Latency (usecs)
| Bandwidth (MB/sec)
| Portable?
|
| MPI
| 43
| 34
| yes
|
| MPL
| 45
| 34
| no
|
| MPICH
| 58
| 33
| yes
|
| PVMe (interrupts off>)
| 83
| 31
| yes
|
| PVMe
| 220
| 27
| yes
|
| PVM (RouteDirect w/in place packing)
| 642
| 12
| yes
|
| PVM (DontRoute w/in place packing)
| 1450
| 3
| yes
|
The following recommendations concern the choice of MPL, MPI on the IBM SP2.
MPL
====
Message Passing Library - SP2 native message passing library
- advantages
- supported by IBM (for now)
- best performance (that is currently available)
- good functionality
- disadvantages
- not portable
- IBM will be dropping support in favor of MPI
- recommendation
- acceptable if user needs performance and is willing to give up
portability
Library Comparison:
Recommendations
PVM
====
Parallel Virtual Machine - public domain package available from Oak
Ridge National Lab
- advantages
- portable
- has some functionality not found in other packages (yet) -
e.g. dynamic task control
- disadvantages
- poor performance
- lacks functionality (except dynamic control)
- recommendation
- use only for existing PVM applications or those specifically
requiring dynamic control features
Library Comparison:
Recommendations
MPI
====
Message Passing Interface
- advantages
- portability - a standard for message passing
- good performance on many platforms
- good functionality
- vendor support promised
- MPI-2 may add dynamic task control and parallel I/O
- recommendation
- strongly recommended for new applications
- W. Gropp, E. Lusk and A. Skjellum, "Using MPI", MIT Press, Cambridge,
Massachusetts.
- We gratefully acknowledge William Saphir of NAS for permission to
make use of his workshop materials.
Maui High Performance Computing Center. All rights reserved.
Revised: 20 August 1997 webmaster@mhpcc.hpc.mil