MPI Exercise Code Descriptions


  1. Array Decomposition
  2. Matrix Multiplication
  3. pi Calculation
  4. Concurrent Wave Equation
  5. 2D Heat Equation Domain Decomposition
  6. Round Trip Timing Test
  7. Bandwidth Timing Test
  8. Prime Number Generation
  9. 2D FFT
  10. Image Rotation
  11. Mandelbrot Image
Array Decomposition

This is the "hello world" of parallel programming. It is a simple array decomposition used to demonstrate the distribution of data among multiple tasks and the communications required to accomplish it. In the parallel version, the master task distributes an equal portion of the array to each worker task. Each worker task receives its portion of the array and performs a simple value assignment to each of its elements. Each worker then sends its portion of the array back to the master. As the master receives back each portion of the array selected elements are displayed.

Matrix Multiplication

This example is a simple matrix multiply program. In the parallel version, the data is distributed among the worker tasks who perform the actual multiplication and send back their respective results to the master task.

Note that the C and Fortran versions of this code differ because of the way arrays are stored/passed. C arrays are row-major order but FORTRAN arrays are column-major order.

pi Calculation

This program calculates pi using a "dartboard" algorithm. See Fox et al.(1988) Solving Problems on Concurrent Processors, vol.1 page 207. In the parallel version, all processes contribute to the calculation, with the master averaging the values for pi.

The parallel version has two variations, one which demonstrates point-to-point communications and one that demonstrates collective communications.

Concurrent Wave Equation

This program implements the concurrent wave equation described in Chapter 5 of Fox et al., 1988, Solving Problems on Concurrent Processors, vol 1.

A vibrating string is decomposed into points. In the parallel version, each processor is responsible for updating the amplitude of a number of points over time. At each iteration, each processor exchanges boundary points with their nearest neighbors.

The parallel versions provide an X based display of the final wave.


Sample image

2D Heat Equation

This example is based on a simplified two-dimensional heat equation domain decomposition. The initial temperature is computed to be high in the middle of the domain and zero at the boundaries. The boundaries are held at zero throughout the simulation. During the time-stepping, an array containing two domains is used; these domains alternate between old data and new data.

The parallel versions provide an X based display of the initial and final temperature distributions.

Before image
Before
After image
After

Round Trip Timing Test

This example demonstrates a roundtrip, point-to-point communication timing test. A set number of messages are sent between the master and worker tasks. Before and after timings are made for each message and an average is calculated when completed.

Bandwidth Timing Test

This code demonstrates a point-to-point, bandwidth communications test. It begins by asking the user to supply the following parameters:
  • starting message size (J)
  • finish message size (K)
  • message increment size (I)
  • number of round trips per iteration (N)
It then sends/receives N roundtrips of incrementally sized messages from start size J to finish size K by increment I. The bandwidth for each message size is calculated as the average of N round trips.

Prime Number Generator

This program generates primes. The approach taken is a "brute force" method which requires increasingly greater amounts of cpu as the problem size increases. It lends itself well to an embarassingly parallel solution since each prime can be computed independently of all other primes.

2D FFT

A 512x512 complex matrix is initialized with a point source and then decomposed and distributed to each processor in the partition (by rows for C; by columns for Fortran). Each processor then performs a one-dimensional FFT on it's portion of the matrix which is stored locally. Each processor transposes it's portion of the matrix and performs an "All to all" distribution to the other processors in the partition. This now partitions the intermediate matrix by columns for C; rows for Fortran. Each processor then performs a one-dimensional FFT on it's portion of the matrix. Finally, the columns/rows of the matrix are gathered back at the destination processor and timing and Mflop results are displayed. C programs perform an additional test for correctness.

Note: A straightforward unsophisticated 1D FFT kernel is used. It is sufficient to convey the general idea, but be aware that there are better 1D FFTs available on many systems.

Image Rotation

This example is written in C using MPI. An image is read from disk and repeatedly rotated by one degree increments. After each image rotation the screen is updated.

A single processor reads the input image from disk and broadcasts to all the other processors in the system. Each processor allocates space for a contiguous swath of rows in the output image. Processors then calculate the coordinates of the source input image pixel for each pixel in this output image swath. Individual swaths of the output image are then collected at the destination processor. The destination processor then displays the image on the local X server.

Note: When running this program make sure that your DISPLAY environment variable is set and that the MP_PROCS environment variable is set to 4.


Sample image

Mandelbrot Image

This program generates and Mandelbrot image and displays it using X Windows. It is written in C using MPI.

Each processor calculates the image pixels for a swath of rows in the output image. These swaths of rows are then collected at a single processor and displayed on the screen.


Sample image