SP Parallel Programming II Workshop
m p i     p e r f o r m a n c e     t o p i c s



  Table of Contents
  1. Review of MPI Message Passing
    1. Terminology
    2. MPI Communication Routines
  2. Factors Affecting MPI Performance
  3. Message Buffering
  4. MPI Message Passing Protocols
    1. Eager Protocol
    2. Rendezvous Protocol
    3. Eager Protocol vs. Rendezvous Protocol
  5. Message Size
  6. Point-to-Point Communications
  7. Persistent Communications
  8. Collective Communications
  9. Derived Datatypes
  10. Network Contention
  11. References and More Information
  12. Exercise


 
Review of MPI Message Passing Up to Table of Contents Down to MPI Communication Routines
Terminology

Latency

The overhead associated with sending a zero-byte message between two MPI tasks. Total latency is a combination of both hardware and software factors, with the software contribution generally being much greater than that of the hardware. It is usually measured in milli/microseconds.

Bandwidth

The rate at which data can be transmitted between two MPI tasks. Like latency, bandwidth is combination of both hardware and software factors. It is usually measured in bytes/megabytes per second. Preliminary communication benchmarks results can be obtained here.

Application Buffer

The user program address space which holds the data that is to be sent or received. For example, your program uses a variable called, "inmsg". This variable is clearly visible in the program text and able to be managed by the programmer. The application buffer for inmsg is the program memory location where the value of inmsg resides. It is within user address and can be "debugged".

System Buffer

System address space for storing messages, which is not visible to the programmer. Depending upon the type of communication operation, data in the application buffer may be required to be copied to/from system buffer space. The primary purpose of system buffer space is to enable asynchronous communications.

Blocking Communication

A communication routine is blocking if the completion of the call is dependent on certain "events". For sends, the data must be successfully sent or safely copied to system buffer space so that the application buffer that contained the data is available for reuse. For receives, the data must be safely stored in the receive buffer so that it is ready for use.

Non-blocking Communication

A communication routine is non-blocking if the call returns without waiting for any communications events to complete (such as copying of message from user memory to system memory or arrival of message).

It is not safe to modify or use the application buffer after completion of a non-blocking send. It is the programmer's responsibility to insure that the application buffer is free for reuse.

Non-blocking communications are primarily used to overlap computation with communication to effect performance gains.

Synchronous / Asynchronous

A synchronous send operation will complete only after acknowledgement that the message was safely received by the receiving process. Asynchronous send operations may "complete" even though the receiving process has not actually received the message.

Ready Communication

Refers to a send operation in which the programmer has guaranteed a waiting receive has already been posted. It is the programmer's responsibility to insure correctness.

Message Envelope

MPI messages consist of the a "data" portion, and an "envelope" portion. The encoding of the envelope portion is implementation dependent, but typically consists of the message tag, communicator, source, destination, possibly the message length and other implementation specific information.


 
Review of MPI Message Passing Up to Terminology Down to Factors Affecting MPI Performance
MPI Communication Routines


 
Factors Affecting MPI Performance Up to MPI Communication Routines Down to Message Buffering


 
Message Buffering Up to Factors Affecting MPI Performance Down to MPI Message Passing Protocols

For IBM's MPI:

  • The environment variable MP_BUFFER_MEM allows the user to specify how much system buffer space is available for the receive process. Defaults are IP = 2.8 MB and US = 64 MB. Using this with MP_EAGER_LIMIT may improve performance for some programs (discussed in next section).


 
MPI Message Passing Protocols Up to Message Buffering Down to Eager Protocol



 
MPI Message Passing Protocols Up to MPI Message Passing Protocols Down to Rendezvous Protocol
Eager Protocol

For IBM's MPI:

  • The eager protocol message size is set with the MP_EAGER_LIMIT environment variable. Its default values appear in the table below. These values guarantee that at least 32 messages can be outstanding between any two tasks.

    POE Version Number of Tasks MP_EAGER_LIMIT (bytes)
    2.3 & 2.4 1 - 16 4096
    17 - 32 2048
    33 - 64 1024
    65 - 128 512
    2.4 129 - 256 256
    256 to implementation maximum 128

  • The maximum user specified MP_EAGER_LIMIT value is 64K bytes, and may require increasing the default receive buffer size with MP_BUFFER_MEM.


 
MPI Message Passing Protocols Up to Eager Protocol Down to Eager Protocol vs. Rendezvous Protocol
Rendezvous Protocol



 
MPI Message Passing Protocols Eager Protocol vs. Rendezvous Protocol


 
Message Size Down to Point-to-Point Communications



 
Point-to-Point Communications Up to Message Size Down to Persistent Communications



 
Persistent Communications Up to Point-to-Point Communications Down to Collective Communications



 
Derived Datatypes Up to Collective Communications Down to Network Contention



 
Network Contention Up to Derived Datatypes Down to References and More Information


 
References and More Information Up to Network Contention

Notes

1 Timing results were obtained on two IBM SP nodes (4-way SMP 332 MHz 604e) configured with 1.5 GB of memory. Unless otherwise indicated, User Space communications were used over the High-Performance Switch. All executions were conducted in a production batch system and used only one processor of a 4-way SMP node.

2 Timing results were obtained on a variable number of IBM SP nodes (4-way SMP 332 MHz 604e) configured with 1.5 GB of memory. Communication protocols used were User Space and Internet Protocol over the High-Performance Switch. All executions were conducted in a production batch system. "Onnode" timings indicate that MPI tasks populated as fully as possible, all of the available processors on 4-way SMP nodes.