Parallel Operating Environment (POE) Exercise

  1. Login to the SP machine

    Make sure that you are logged into your assigned SP node with your assigned userid for this exercise. Ask the instructor if you have any questions.

  2. Verify the environment variable $WORKSHOP

    The $WORKSHOP variable defines the root location for the workshop files, and may vary from workshop to workshop. Find out if this has already been setup for your workshop:

    echo $WORKSHOP

    If this environment variable is not set, check with the instructor for the correct location. Then, depending upon your shell, set $WORKSHOP:

    csh/tcsh
    setenv WORKSHOP instructor/specified/path
    
    bsh/ksh
    export WORKSHOP=instructor/specified/path
    

  3. Understand Your System Configuration

    Your instructor should have already explained the overall configuration of the system you are using, noting which nodes/pools are available for workshop use.

  4. Authentication

    Find out which type of AIX authorization is being used for the nodes your job will run on.

    1. Check the /etc/hosts.equiv file. Does it contain the names of the nodes being used for this exercise? If so, you're done here - skip to the next section.

    2. Check your home directory for a .rhosts file (you'll need to use the ls -a command because its a "hidden" file). If you find one, and it contains the names of the nodes being used for this exercise, you're done here - skip to the next section.

    3. Assuming you did not find a .rhosts file, or it did not contain the names of the workshop nodes, use your favorite Unix editor to create/append this file. Enter all of the names for the workshop nodes, one node name per line. Ask the instructor if you are not certain about the node names.

      When done, make sure that the file resides in your home directory, is called .rhosts, and has write permission for your userid only.

  5. Copy the example files

    1. In your SP home directory, create a subdirectory for the POE test codes and cd to it.

      mkdir ~/poe
      cd ~/poe

    2. Copy either the Fortran or the C version of the exercise files to your poe subdirectory:

      C:
      cp  $WORKSHOP/poe/samples/C/*   ~/poe
      
      Fortran:
      cp  $WORKSHOP/poe/samples/Fortran/*   ~/poe
      

  6. List the contents of your poe subdirectory

    You should notice two files:

    Language File Name Description
    C: poe_hello.c Simple MPI program which prints a task's rank and hostname. C version.
    poe_bandwidth.c An MPI communications bandwidth test between two nodes. C version.
    Fortran: poe_hello.f Simple MPI program which prints a task's rank and hostname. Fortran version.
    poe_bandwidth.f An MPI communications bandwidth test between two nodes. Fortran version.

  7. Compile the poe_hello program

    Depending upon your language preference, use one of the IBM parallel compilers to compile the poe_hello program.

    C:
    mpcc -o poe_hello poe_hello.c
    
    Fortran:
    mpxlf -o poe_hello poe_hello.f 
    

  8. Setup your POE environment

    In this step you'll set a few POE environment variables. Specifically, those which answer the three questions:

    Depending upon your shell, set the following environment variables as shown:

    Description csh/tcsh ksh/bsh
    Request 4 nodes for 4 tasks
    setenv MP_PROCS 4
    
    export MP_PROCS=4
    
    Non-specific allocation (let the Resource Manager do it)
    setenv MP_RESD yes
    
    export MP_RESD=yes
    
    Set poolid to the workshop nodes pool number. Use the jm_status -P command or ask the instructor if you are not sure.
    setenv MP_RMPOOL poolid
    
    export MP_RMPOOL=poolid
    
    Use IP communications since you're running interactive with other users
    setenv MP_EUILIB ip
    
    export MP_EUILIB=ip
    
    Use the high performance switch network interface
    setenv MP_EUIDEVICE css0
    
    export MP_EUIDEVICE=css0
    
    

  9. Run your poe_hello executable

    1. This is the simple part. Just issue the command:

      poe_hello

    2. Provided that everything is working and setup correctly, you should receive output that looks something like below (your node names will vary, of course).
      Total number of tasks = 4
      Hello! From task 1 on host node1.abc.edu
      Hello! From task 2 on host node8.abc.edu
      Hello! From task 3 on host node23.abc.edu
      Hello! From task 0 on host node3.abc.edu
      

  10. Try the poe_bandwidth exercise code

    1. Depending upon your language preference, compile the poe_bandwidth source file as shown:

      C:
      mpcc -o poe_bandwidth poe_bandwidth.c
      
      Fortran:
      mpxlf -o poe_bandwidth poe_bandwidth.f
      

    2. Change the value of your MP_PROCS variable to 2. For example:

      csh/tcsh
      setenv MP_PROCS 2
      
      ksh/bsh
      export MP_PROCS=2
      

    3. Run the executable:

      poe_bandwidth

      As the program runs, it will display the effective communications bandwidth between nodes for a given message size. Since the MP_EUILIB variable was not modified from the last code, what you will be seeing is the bandwidth for Internet protocol (ip) over the high performance switch.

    4. Now, change the communications protocol to User Space (us) protocol:

      csh/tcsh
      setenv MP_EUILIB us
      
      ksh/bsh
      export MP_EUILIB=us
      

    5. Run the executable again, and notice the output. You should see a significant increase in bandwidth. Note: It is very possible that when you try this step, you will get an error message that looks something like:

      ERROR: 0031-124 Less than XX nodes available from pool N

      This is because there may be others in the workshop using nodes in User Space mode at the same time as you. Remember that there can only be one User Space job per node if your POE version is less than 2.4. If you get this error message, just try running again in a few seconds/minutes.

  11. Try setting some of the other POE environment variables

    POE has a number of other environment variables which may be useful. Try running the poe_hello code again after setting the following:

    Shell Command Description
    csh/tcsh
    setenv MP_PROCS 4
    
    Use 4 tasks/nodes again
    setenv MP_EUILIB ip
    
    Go back to Internet protocol
    setenv MP_LABELIO yes
    
    Prepend I/O with the task rank
    setenv MP_SAVEHOSTFILE myhosts
    
    Save the names of the nodes used in a file called "myhosts"
    ksh/bsh
    export MP_PROCS=4
    
    Use 4 tasks/nodes again
    export MP_EUILIB=ip 
    
    Go back to Internet protocol
    export MP_LABELIO=yes
    
    Prepend I/O with the task rank
    export MP_SAVEHOSTFILE=myhosts
    
    Save the names of the nodes used in a file called "myhosts"

    What happens? Look closely at the screen output and compare it to what you saw the first time you ran the code. Also, check the contents of the file "myhosts". It should confirm what you see on the screen as output.

  12. Try specific node allocation using a host list file

    Generally speaking, there aren't many cases where you'll need to "manually" select which nodes should be used to run your POE job. This step will demonstrate how to do it though, should you ever have the need.

    1. First, use your favorite UNIX editor and create a file. Call it hostfile. As its contents, enter 4 different node names from the workshop node pool - one node name per line.

    2. Set the appropriate POE environment variables which specify specific node allocation:

      Shell Command Description
      csh/tcsh
      setenv MP_RESD no
      
      Turn off selection by the Resource Manager - just to be sure
      setenv MP_HOSTFILE hostfile
      
      Specify the host file you created
      csh/tcsh
      export MP_RESD=no
      
      Turn off selection by the Resource Manager - just to be sure
      export MP_HOSTFILE=hostfile
      
      Specify the host file you created

    3. Run the poe_hello executable again and observe the output. Does it match what you specified in your hostlist file?

This concludes the POE exercise.