SP Parallel Programming II Workshop
m p i - i o



  Table of Contents

  1. Overview
    1. History
    2. Motivation For Using MPI-IO
    3. Primary Features
  2. Terminology
  3. Basic Usage Example
  4. File Manipulation
  5. File Views
  6. Data Access
    1. Explicit Offsets
    2. Individual File Pointers
    3. Shared File Pointers
    4. Split Collective Data Access
  7. File Interoperability
  8. File Consistency
  9. MPI-IO Implementations
  10. Example Codes
  11. GPFS
  12. References and More Information


 
Overview Up to Table of Contents Down to Terminology

History


  Motivation For Using MPI-IO


  Primary Features



 
Terminology Up to Overview Down to Basic Usage Example
file

An MPI file is an ordered collection of typed data items, called etypes. MPI supports random or sequential access to any integral set of these items. A file is opened collectively by a group of processes. Access of the file data can be collective or non-collective.

etype

An etype ( elementary datatype) is the unit of data access and positioning. It can be any MPI predefined or derived datatype. Data access is performed in etype units, reading or writing whole data items of type etype. Offsets are expressed as a count of etypes; file pointers point to the beginning of etypes.

filetype

A filetype is the basis for partitioning a file among processes and defines a template for accessing the file. A filetype is either a single etype or a derived MPI datatype constructed from multiple instances of the same etype.

The diagram below depicts four different filetypes based upon MPI derived type constructors. Each individual block is an etype. Colored blocks (including "holes") comprise the filetype.

Filetypes

view

A view defines the current set of data visible and accessible from an open file as an ordered set of etypes. Each process has its own view of the file, defined by three quantities: a displacement, an etype, and a filetype. The pattern described by a filetype is repeated, beginning at the displacement, to define the view. Views can be changed by the user during program execution. The default view is a linear byte stream.

The diagram below demonstrates how multiple tasks, using different views composed of complementary filetypes, can be used to effect data partitioning. Each individual block is an etype. Each colored group represents a filetype.

File Views

offset

An offset is a position in the file relative to the current view, expressed as a count of etypes. Holes in the view's filetype are skipped when calculating this position. Offset 0 is the location of the first etype visible in the view.

displacement

A file displacement is an absolute byte position relative to the beginning of a file. The displacement defines the location where a view begins. Useful to skip a header or a region of the file previously accessed with different filetypes

File displacement

file pointer

A file pointer is an implicit offset maintained by MPI. Individual file pointers are file pointers that are local to each process that opened the file. A shared file pointer is a file pointer that is shared by the group of processes that opened the file.

file handle

A file handle is an opaque object created by MPI_FILE_OPEN and freed by MPI_FILE_CLOSE. All operations on an open file reference the file through the file handle.



 
Basic Usage Example Up to Terminology Down to File Manipulation

Example Code Fragment


MPI_File fh;
MPI_Datatype filetype;
MPI_Status status;
MPI_Offset offset;
int mode;
float data[100];

/*   other code */
/*   set offset and filetype (covered later) */

mode = MPI_MODE_CREATE|MPI_MODE_RDWR;

MPI_File_open(MPI_COMM_WORLD, "myfile", mode, 
    MPI_INFO_NULL &fh); 

MPI_File_set_view(fh, offset, MPI_FLOAT, filetype, "native", 
    MPI_INFO_NULL);  

MPI_File_write(fh, data, 100, MPI_FLOAT, &status);

MPI_File_close(&fh);


 
File Manipulation Up to Basic Usage Example Down to File Views



 
File Views Up to File Manipulation Down to Data Access

C Language Example: Setting a File View


static int buf_size = 1024;
static int blocklen = 256; 
static char filename[] = "myfile.out";

/*   other code         */

char *buf, *p;
int myrank, commsize, mode, nbytes;
MPI_Datatype filetype, buftype;
int length[3];
MPI_Aint disp[3];
MPI_Datatype type[3];
MPI_File fh;
MPI_Offset offset;
MPI_Status status;

/* initialize MPI */
MPI_Init( &argc, &argv );
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &commsize);

/* initialize buffer */
buf = (char *) malloc(buf_size);
memset((void *)buf, '0' + myrank, buf_size);

/* create buftype */
MPI_Type_contiguous(buf_size, MPI_CHAR, &buftype);
MPI_Type_commit(&buftype);

/* create filetype */
length[0] = 1;
length[1] = blocklen;
length[2] = 1;
disp[0] = 0;
disp[1] = blocklen * myrank;
disp[2] = blocklen * commsize;
type[0] = MPI_LB;
type[1] = MPI_CHAR;
type[2] = MPI_UB;

MPI_Type_struct(3, length, disp, type, &filetype);
MPI_Type_commit(&filetype);

/* open file */
mode = MPI_MODE_CREATE | MPI_MODE_WRONLY;
MPI_File_open(MPI_COMM_WORLD, filename, mode, 
    MPI_INFO_NULL, &fh);

/* set file view */
offset = 0;
MPI_File_set_view(fh, offset, MPI_CHAR, 
    filetype, "native", MPI_INFO_NULL);
View source code Complete source code.

Fortran Example: Setting a File View


      integer buf_size
      parameter (buf_size = 1024)
      integer blocklen
      parameter (blocklen = 256)

      character*255 filename
      character buf(buf_size)

      integer myrank, commsize, i, filetype, buftype
      integer mode, nbytes, ierr, fh
      integer length(3), disp(3), type(3)
      integer status(MPI_STATUS_SIZE)
      integer(kind=MPI_OFFSET_KIND) offset

C     initialize file name
      filename = 'myfile.out'

C     zero out buffer
      do i = 1, buf_size
        buf(i) = '\0'
      enddo

C     initialize MPI
      call MPI_INIT(ierr)
      call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, commsize, ierr)

C     create buftype
      call MPI_TYPE_CONTIGUOUS(buf_size, MPI_CHARACTER,
     +     buftype, ierr)
      call MPI_TYPE_COMMIT(buftype, ierr)

C     create filetype
      length(1) = 1
      length(2) = blocklen
      length(3) = 1
      disp(1) = 0
      disp(2) = blocklen * myrank
      disp(3) = blocklen * commsize
      type(1) = MPI_LB
      type(2) = MPI_CHARACTER
      type(3) = MPI_UB

      call MPI_TYPE_STRUCT(3, length, disp, type, 
     +    filetype, ierr)
      call MPI_TYPE_COMMIT(filetype, ierr)

C     open file
      mode = MPI_MODE_RDONLY
      call MPI_FILE_OPEN(MPI_COMM_WORLD, filename,
     +    mode, MPI_INFO_NULL, fh, ierr)

C     set file view
      offset = 0
      call MPI_FILE_SET_VIEW(fh, offset, MPI_CHARACTER, filetype,
     +    'native',  MPI_INFO_NULL, ierr)
View source code Complete source code.


 
Data Access Up to File Views Down to File Interoperability


  Data Access With Explicit Offsets

C Language Example: Explicit Blocking Collective Access


/* open file */
mode = MPI_MODE_CREATE | MPI_MODE_WRONLY;

MPI_File_open(MPI_COMM_WORLD, filename, mode, 
    MPI_INFO_NULL, &fh);

/* set file view */
offset = 0;
MPI_File_set_view(fh, offset, MPI_CHAR, filetype, 
    "native", MPI_INFO_NULL);

/*  write buffer to file */
MPI_File_write_at_all(fh, offset, (void *)buf, 1, 
    buftype, &status);

/* print out number of bytes written */
MPI_Get_elements(&status, MPI_CHAR, &nbytes);
printf("TASK %d Number of bytes written = %d n", myrank, nbytes);

/* close file */
MPI_File_close(&fh);
View source code Complete source code.

Fortran Example: Explicit Blocking Collective Access


C     open file
      mode = MPI_MODE_RDONLY
      call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, 
     +    mode, MPI_INFO_NULL, fh, ierr)

C     set file view
      offset = 0
      call MPI_FILE_SET_VIEW(fh, offset, MPI_CHARACTER, 
     +    filetype, 'native', MPI_INFO_NULL, ierr)

C     read data and fill up buffer
      call MPI_FILE_READ_AT_ALL(fh, offset, buf, 1, buftype, 
     +     status, ierr)

C     print out number of bytes read
      call MPI_GET_ELEMENTS(status, MPI_CHARACTER, nbytes, ierr)
      write( 0, * ) 'TASK ', myrank,
     +    ' number of bytes read = ', nbytes

C     close file
      call MPI_FILE_CLOSE(fh, ierr)
View source code Complete source code.


  Data Access With Individual File Pointers

C Language Example: Individual Pointer Nonblocking Access


MPI_Request request;
MPI_Status status;
MPI_File fh;
float data[100];

/* assumes an open file and set view */

MPI_File_iwrite (fh, data, 100, MPI_FLOAT, &request);

/* do work */

MPI_Wait (&request, &status);

/* now safe to use data[100] */


  Data Access With Shared File Pointers


  Split Collective Data Access

C Language Example: Split Collective


MPI_File fh;
MPI_Status status;
float data[100];

/* assumes an open file an set view */

MPI_File_write_all_begin (fh, data, 100, MPI_FLOAT);

/* do work */
/* no other collective operations on this file */
/* no use of data[100] */

MPI_File_write_all_end (fh, data, &status);

/* now safe to use data[100] */



 
File Interoperability Up to Data Access Down to File Consistency



 
File Consistency Up to File Interoperability Down to MPI-IO Implementations


 
MPI-IO Implementations Up to File Consistency Down to Example Codes



 
Example Codes Up to MPI-IO Implementations Down to GPFS

Gather / Scatter (from Jean-Pierre Prost, IBM Corporation)

In this example, the C program performs the scatter. Each MPI task initializes its 1024 character buffer by filling it with characters corresponding to its task id. Each task then creates a filetype and sets a fileview which is complementary with the other MPI tasks, as shown below:

Because each filetype contains only 256 (blocklen) characters, the result is a "tiled" output file, as shown in the diagram below:

The Fortran program performs the gather. Each MPI task creates a filetype and sets a view which is complementary and which matches the corresponding C program's output.

The complete codes are shown below.

C Example: Scatter

#include "mpi.h"

static int buf_size = 1024;
static int blocklen = 256;
static char filename[] = "scatter.out";

main(int argc, char **argv)
{
    char *buf;
    char *p;
    int myrank;
    int commsize;
    MPI_Datatype filetype;
    MPI_Datatype buftype;
    int length[3];
    MPI_Aint disp[3];
    MPI_Datatype type[3];
    MPI_File fh;
    int mode;
    MPI_Offset offset;
    MPI_Status status;
    int nbytes;

    /* initialize MPI */
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &commsize);

    /* initialize buffer */
    buf = (char *) malloc(buf_size);
    memset(( void *)buf, '0' + myrank, buf_size);

    /* create and commit buftype */
    MPI_Type_contiguous(buf_size, MPI_CHAR, &buftype);
    MPI_Type_commit(&buftype);

    /* create and commit filetype */
    length[0] = 1;
    length[1] = blocklen;
    length[2] = 1;
    disp[0] = 0;
    disp[1] = blocklen * myrank;
    disp[2] = blocklen * commsize;
    type[0] = MPI_LB;
    type[1] = MPI_CHAR;
    type[2] = MPI_UB;

    MPI_Type_struct(3, length, disp, type, &filetype);
    MPI_Type_commit(&filetype);

    /* open file */
    mode = MPI_MODE_CREATE | MPI_MODE_WRONLY;

    MPI_File_open(MPI_COMM_WORLD, filename, mode, 
        MPI_INFO_NULL, &fh);

    /* set file view */
    offset = 0;
    MPI_File_set_view(fh, offset, MPI_CHAR, filetype, 
        "native", MPI_INFO_NULL);

    /*  write buffer to file */
    MPI_File_write_at_all(fh, offset, (void *)buf, 1, 
        buftype, &status);

    /* print out number of bytes written */
    MPI_Get_elements(&status, MPI_CHAR, &nbytes);
    printf( "TASK %d ====== number of bytes written = %d ======\n", 
        myrank, nbytes);

    /* close file */
    MPI_File_close(&fh);

    /* free datatypes */
    MPI_Type_free(&buftype);
    MPI_Type_free(&filetype);

    /* free buffer */
    free (buf);

    /* finalize MPI */
    MPI_Finalize();
}

Fortran Example: Gather

      program gather

      implicit none

      include 'mpif.h'

      integer buf_size
      parameter (buf_size = 1024)
      integer blocklen
      parameter (blocklen = 256)

      character*255 filename
      character buf(buf_size)

      integer myrank
      integer commsize

      integer i
      integer filetype
      integer buftype
      integer length(3)
      integer disp(3)
      integer type(3)
      integer fh
      integer mode
      integer status(MPI_STATUS_SIZE)
      integer(kind=MPI_OFFSET_KIND) offset
      integer nbytes
      integer ierr

C     initialize file name - this is the output of the 
C     gather program
      filename = 'scatter.out'

C     zero out buffer
      do i = 1, buf_size
        buf(i) = '\0'
      enddo

C     initialize MPI
      call MPI_INIT(ierr)
      call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, commsize, ierr)

C     create buftype
      call MPI_TYPE_CONTIGUOUS(buf_size, MPI_CHARACTER, buftype, ierr)
      call MPI_TYPE_COMMIT(buftype, ierr)

C     create filetype
      length( 1 ) = 1
      length( 2 ) = blocklen
      length( 3 ) = 1
      disp( 1 ) = 0
      disp( 2 ) = blocklen * myrank
      disp( 3 ) = blocklen * commsize
      type( 1 ) = MPI_LB
      type( 2 ) = MPI_CHARACTER
      type( 3 ) = MPI_UB

      call MPI_TYPE_STRUCT(3, length, disp, type, filetype, ierr)
      call MPI_TYPE_COMMIT(filetype, ierr)

C     open file
      mode = MPI_MODE_RDONLY
      call MPI_FILE_OPEN(MPI_COMM_WORLD, filename, mode,
     +                    MPI_INFO_NULL, fh, ierr)

C     set file view
      offset = 0
      call MPI_FILE_SET_VIEW(fh, offset, MPI_CHARACTER,
     +                        filetype, 'native', MPI_INFO_NULL, ierr)

C     read data and fill up buffer
      call MPI_FILE_READ_AT_ALL(fh, offset, buf, 1, buftype,
     +                           status, ierr)

C     print out number of bytes read
      call MPI_GET_ELEMENTS(status, MPI_CHARACTER, nbytes, ierr)
      write( 0, * ) 'TASK ', myrank,
     +              ' ====== number of bytes read = ',
     +              nbytes, ' ======'

C     close file
      call MPI_FILE_CLOSE(fh, ierr)

C     free datatypes
      call MPI_TYPE_FREE(buftype, ierr);
      call MPI_TYPE_FREE(filetype, ierr);

C     finalize MPI
      call MPI_FINALIZE(ierr)

      end


Other Example Codes

These example codes are provided on an "as is" basis. Note that running these codes will require an installed MPI-IO system. They are intended to be used here primarily for demonstration purposes.



 
GPFS Up to Example Codes Down to References and More Information



 
References and More Information Up to GPFS