SP Parallel Programming Workshop
P a r a l l e l     O p e r a t i n g     E n v i r o n m e n t     ( POE P2SC)



  Table of Contents

  1. What is the Parallel Operating Environment?
  2. POE Definitions
  3. Using POE: Overview
  4. Understanding Your System Configuration
  5. Establishing Authorization
  6. Compiling and Linking a Parallel Program
  7. Setting Up The Execution Environment
    1. Setting POE Environment Variables
    2. Basic POE Environment Variables
    3. Example Basic Environment Variable Settings
    4. Miscellaneous POE Environment Variables
  8. Invoking the Executable
  9. CPU and Communications Adapter Usage
  10. Terminating a POE Job
  11. Run-time Analysis Tools
  12. Parallel File Copy Utilities
  13. Site Specific Information and Recommendations
  14. References and More Information
  15. Appendix A: Programming Considerations
  16. Exercise



 
What is the Parallel Operating Environment? Up to Table of Contents Down to POE Definitions



 
POE Definitions Up to What is POE? Down to Using POE

Before learning how to use POE, understanding some basic definitions may be useful.

Node
For a parallel POE job, a node refers to single machine, which has a unique network name/address. There are a variety of IBM machine models which can be called a "node", including SMP machines that have multiple CPUs. For SMP machines, the entire machine is considered a node.

Pool
A pool is an arbitrary collection of nodes assigned by the system managers of an SP system. Pools are typically used to separate nodes into disjoint groups, each of which is used for specific purposes. For example, on a given system, some nodes may be designated as "login" nodes, while others are reserved for "batch" or "testing" use only.

Partition
The group of nodes on which you run your parallel program is called your partition. Typically, there are multiple active partitions for multiple users across an SP system. Depending upon how a given SP system is configured, the nodes in your partition may/may not be shared with other users' partitions.

Home Node / Remote Node
For interactive use, your home node is the node where you are logged into, and where you start your POE job. A Remote Node is any other non-home node in your partition. (Technically speaking, the home node may or may not not be considered part of your partition).

Job Manager
The Job Manager in POE version 2.4+ is a function provided by LoadLeveler, IBM's batch scheduling software. When you request nodes to run your parallel job, LoadLeveler will find and allocate nodes for your use. LoadLeveler also enables user jobs to take advantage of multiple CPU SMP nodes, and keeps track of how the switch is used for communications.

Partition Manager
The POE Partition Manager is a daemon process (poe) which is automatically started for you whenever you run an interactive POE job. The Partition Manager process resides on your Home Node, and is responsible for overseeing the parallel execution of your POE job. The Partition Manager generally operates transparently to the user, and is responsible for performing many of the tasks associated with POE:

User Space Protocol
The fastest method for intertask MPI communications. Can only be used with the high performance switch. Often referred to simply as US protocol.

Internet Protocol
A slower, but more flexible method of intertask MPI communications. Can be used with other network adapters besides the high performance switch. Often referred to simply as IP protocol.

Non-Specific Node Allocation
Refers to the Job Manager (LoadLeveler)automatically selecting which nodes will be used to run your parallel job. Non-specific node allocation is usually the recommended (and default) method of node allocation. For batch jobs, this is typically the only method of node allocation available.

Specific Node Allocation
Enables the user to explicitly choose which nodes will be used to run a POE job. Requires the use of a "host list file", which contains the actual names of the nodes that must be used. Specific node allocation is only for interactive use, and recommended only when there is a reason for selecting specifc nodes.



 
Using POE: Overview Up to POE Definitions Down to Understanding Your Configuration

POE can be used both interactively and within a batch (scheduler) system to compile, load and run parallel jobs. The typical progression of steps is outlined below, and discussed in more detail in following sections.

  1. Understand your system's configuration

  2. Establish authorization on all nodes which you will use

  3. Compile and link the program using one of the POE parallel compiler scripts

  4. Set up your execution environment by setting the necessary POE environment variables. Create a host list file if using specific node allocation.

  5. Start any run-time analysis tools (optional)

  6. Invoke the executable


 
Understanding Your System Configuration Up to Using POE Down to Establishing Authorization


 
Establishing Authorization Up to Understanding Your System Configuration Down to Compiling and Linking a Parallel Program


 
Compiling and Linking a Parallel Program Up to Establishing Authorization Down to Setting Up the Execution Environment


 
Setting Up The Execution Environment Up to Compiling and Linking a Parallel Program Down to Basic POE Environment Variables
Setting POE Environment Variables


 
Setting Up The Execution Environment Up to Setting POE Environment Variables Down to Example Basic Environment Variable Settings
Basic POE Environment Variables


 
Setting Up The Execution Environment Up to Basic POE Environment Variables Down to Miscellaneous POE Environment Variables
Example Basic Environment Variable Settings


 
Setting Up The Execution Environment Up to Example Basic Environment Variable Settings Down to Invoking the Executable
Miscellaneous POE Environment Variables

A list of some commonly used, or potentially useful, POE environment variables appears below. A complete list of the POE environment variables can be viewed quickly in the POE man page. A much fuller discussion is available in the "IBM AIX Parallel Environment for AIX: Operation and Use Volume 1" manual.



 
Invoking the Executable Up to Miscellaneous POE Environment Variables Down to CPU and Communications Adapter Usage

  • For serial programs or commands, use the poe command followed by your program name or the command you wish to run across your partition. For example:

    poe cp ~/input.file /tmp/input.file
    poe my_serial_job
    poe rm /tmp/input.file


     
    CPU and Communications Adapter Usage Up to Invoking the Executable Down to Terminating a POE Job

    Note The remainder of this section is intended primarily for interactive POE usage. SP batch systems, in general, do not permit users to share nodes, making the rest of the information in this section largely irrelevant.



     
    Terminating a POE Job Up to CPU and Communications Adapter Usage Down to Run-time Analysis Tools


     
    Run-time Analysis Tools Up to Terminating a POE Job Down to Parallel File Copy Utilities


     
    Parallel File Copy Utilities Up to Run-time Analysis Tools Down to Site Specific Information and Recommendations


     
    Site Specific Information and Recommendations Up to Parallel File Copy Utilities Down to References and More Information

    This section covers site specific details for POE usage at the MHPCC.



     
    References and More Information Up to Site Specific Information and Recommendations Down to Appendix A: Programming Considerations


     
    Appendix A: Programming Considerations Up to References and More Information