| SP Parallel Programming II Workshop |
| m p i - i o |
| Overview |
|
History
Motivation For Using MPI-IO
Primary Features
| Terminology |
|
The diagram below depicts four different filetypes based upon MPI derived type constructors. Each individual block is an etype. Colored blocks (including "holes") comprise the filetype.
The diagram below demonstrates how multiple tasks, using different views composed of complementary filetypes, can be used to effect data partitioning. Each individual block is an etype. Each colored group represents a filetype.
| Basic Usage Example |
|
|
| File Manipulation |
|
|
MPI_File_open (comm,filename,amode,info,fh)
MPI_File_close (fh) MPI_File_delete (filename,info) MPI_File_set_size (fh,size) MPI_File_preallocate (fh,size) MPI_File_get_size (fh,size) MPI_File_get_group (fh,group) MPI_File_get_amode (fh,amode) MPI_File_set_info (fh,info) MPI_File_get_info (fh,info_used) |
| File Views |
|
|
MPI_File_set_view (fh,disp,etype,filetype,datarep,info)
MPI_File_get_view (fh,disp,etype,filetype,datarep) |
disp sets the start of the view and is specified in bytes from the absolute beginning of the file.
etype is the unit of data access and positioning. It can be any MPI predefined or derived datatype. Derived etypes can be constructed by using any of the MPI datatype constructor routines. All tasks in the collective must use the same extents for etype.
filetype specifies the distribution of data to process.
datarep defines the representation of data in the file. Valid values are "native", "internal" and "external32". All tasks in the collective must use the same datarep.
|
| Data Access |
|
Positioning
Synchronism
Coordination
| Positioning | Synchronism | Coordination | |
|---|---|---|---|
| noncollective | collective | ||
| explicit offsets | blocking | MPI_FILE_READ_AT
MPI_FILE_WRITE_AT | MPI_FILE_READ_AT_ALL
MPI_FILE_WRITE_AT_ALL |
| nonblocking & split collective | MPI_FILE_IREAD_AT
MPI_FILE_IWRITE_AT | MPI_FILE_READ_AT_ALL_BEGIN
MPI_FILE_READ_AT_ALL_END MPI_FILE_WRITE_AT_ALL_BEGIN MPI_FILE_WRITE_AT_ALL_END | |
| individual file pointers | blocking | MPI_FILE_READ
MPI_FILE_WRITE | MPI_FILE_READ_ALL
MPI_FILE_WRITE_ALL |
| nonblocking & split collective | MPI_FILE_IREAD
MPI_FILE_IWRITE | MPI_FILE_READ_ALL_BEGIN
MPI_FILE_READ_ALL_END MPI_FILE_WRITE_ALL_BEGIN MPI_FILE_WRITE_ALL_END | |
| shared file pointer | blocking | MPI_FILE_READ_SHARED
MPI_FILE_WRITE_SHARED | MPI_FILE_READ_ORDERED
MPI_FILE_WRITE_ORDERED |
| nonblocking & split collective | MPI_FILE_IREAD_SHARED
MPI_FILE_IWRITE_SHARED | MPI_FILE_READ_ORDERED_BEGIN
MPI_FILE_READ_ORDERED_END MPI_FILE_WRITE_ORDERED_BEGIN MPI_FILE_WRITE_ORDERED_END | |
Data Access With Explicit Offsets
|
MPI_File_read_at (fh,offset,buf,count,datatype,status)
MPI_File_read_at_all (fh,offset,buf,count,datatype,status) MPI_File_write_at (fh,offset,buf,count,datatype,status) MPI_File_write_at_all (fh,offset,buf,count,datatype,status) MPI_File_iread_at (fh,offset,buf,count,datatype,request) MPI_File_iwrite_at (fh,offset,buf,count,datatype,request) |
|
|
Data Access With Individual File Pointers
|
MPI_File_read (fh,buf,count,datatype,status)
MPI_File_read_all (fh,buf,count,datatype,status) MPI_File_write (fh,buf,count,datatype,status) MPI_File_write_all (fh,buf,count,datatype,status) MPI_File_iread (fh,buf,count,datatype,request) MPI_File_iwrite (fh,buf,count,datatype,request) MPI_File_seek (fh,offset,whence) MPI_File_get_position (fh,offset) MPI_File_get_byte_offset (fh,offset,disp) |
|
Data Access With Shared File Pointers
|
MPI_File_read_shared (fh,buf,count,datatype,status)
MPI_File_read_ordered (fh,buf,count,datatype,status) MPI_File_write_shared (fh,buf,count,datatype,status) MPI_File_write_ordered (fh,buf,count,datatype,status) MPI_File_iread_shared (fh,buf,count,datatype,request) MPI_File_iwrite_shared (fh,buf,count,datatype,request) MPI_File_seek_shared (fh,offset,whence) MPI_File_get_position_shared (fh,offset) |
Split Collective Data Access
|
MPI_File_read_at_all_begin (fh,offset,buf,count,datatype)
MPI_File_read_at_all_end (fh,buf,status) MPI_File_write_at_all_begin (fh,offset,buf,count,datatype) MPI_File_write_at_all_end (fh,buf,status) MPI_File_read_all_begin (fh,buf,count,datatype) MPI_File_read_all_end (fh,buf,status) MPI_File_write_all_begin (fh,buf,count,datatype) MPI_File_write_all_end (fh,buf,status) MPI_File_read_ordered_begin (fh,buf,count,datatype) MPI_File_read_ordered_end (fh,buf,status) MPI_File_write_ordered_begin (fh,buf,count,datatype) MPI_File_write_ordered_end (fh,buf,status) |
|
| File Interoperability |
|
| MPI_FILE_GET_TYPE_EXTENT (fh,datatype,extent) |
| File Consistency |
|
|
MPI_File_set_atomicity (fh,flag)
MPI_File_get_atomicity (fh,flag) MPI_File_sync (fh) |
| MPI-IO Implementations |
|
Version 1.0.2 includes everything defined in the MPI-2 I/O chapter except support for file interoperability (Sec. 9.5 of MPI-2), I/O error handling (Sec. 9.7), and I/O error classes (Sec. 9.8).
Runs on at least the following machines: IBM SP; Intel Paragon; HP Exemplar; SGI Origin2000; Cray T3E; NEC SX-4; other symmetric multiprocessors from HP, SGI, DEC, Sun, and IBM; and networks of workstations (Sun, SGI, HP, IBM, DEC, Linux, and FreeBSD). Supported file systems are IBM PIOFS, Intel PFS, HP/Convex HFS, SGI XFS, NEC SFS, PVFS, NFS, and any Unix file system (UFS).
Version 2.4 of IBM's Parallel Environment software includes an MPI library that contains a subset of the MPI-2 I/O routines. These routines are listed below. Note that routines with an asterisk indicate that, although the routine is present, it actually is not truly implemented.
| Fortran | C Language |
|---|---|
MPI_FILE_CLOSE MPI_FILE_CREATE_ERRHANDLER MPI_FILE_DELETE MPI_FILE_GET_AMODE * MPI_FILE_GET_ATOMICITY * MPI_FILE_GET_ERRHANDLER MPI_FILE_GET_GROUP MPI_FILE_GET_INFO * MPI_FILE_GET_SIZE MPI_FILE_GET_VIEW MPI_FILE_IREAD_AT MPI_FILE_IWRITE_AT MPI_FILE_OPEN MPI_FILE_READ_AT MPI_FILE_READ_AT_ALL MPI_FILE_SET_ERRHANDLER MPI_FILE_SET_INFO * MPI_FILE_SET_SIZE MPI_FILE_SET_VIEW MPI_FILE_SYNC MPI_FILE_WRITE_AT MPI_FILE_WRITE_AT_ALL |
MPI_File_close MPI_File_create_errhandler MPI_File_delete MPI_File_get_amode * MPI_File_get_atomicity * MPI_File_get_errhandler MPI_File_get_group MPI_File_get_info * MPI_File_get_size MPI_File_get_view MPI_File_iread_at MPI_File_iwrite_at MPI_File_open MPI_File_read_at MPI_File_read_at_all MPI_File_set_errhandler MPI_File_set_info * MPI_File_set_size MPI_File_set_view MPI_File_sync MPI_File_write_at MPI_File_write_at_all |
Documentation for these routines can be found at:
See these documents for specifics and limitations.
| Example Codes |
|
Gather / Scatter (from Jean-Pierre Prost, IBM Corporation)
In this example, the C program performs the scatter. Each MPI task initializes its 1024 character buffer by filling it with characters corresponding to its task id. Each task then creates a filetype and sets a fileview which is complementary with the other MPI tasks, as shown below:
Because each filetype contains only 256 (blocklen) characters, the result is a "tiled" output file, as shown in the diagram below:
The Fortran program performs the gather. Each MPI task creates a filetype and sets a view which is complementary and which matches the corresponding C program's output.
The complete codes are shown below.
|
|
Other Example Codes
These example codes are provided on an "as is" basis. Note that running these codes will require an installed MPI-IO system. They are intended to be used here primarily for demonstration purposes.
| GPFS |
|
| References and More Information |
|