parallel computation by using pgcc compiler in dual core mac

I am using dual core workstation. it has 8 processors. can i control each processsors myself to do parallel computation?

Hi Sitha,

Which OS are you using? What type of parallel computation paradigm are you performing, MPI, OpenMP, Auto-parallelization, Threads?

I’m guessing your asking how to bind a OMP thread to a processor on Linux. If this is the case then you a variety of options. First you need ot set the environment variable “NCPUS” (or “OMP_NUM_THREADS”) to the number of processors your application should use and must be set no matter how you bind your threads.

The first option, which most systems support, is “taskset” (see “man taskset” for more info) where you give it a hexadecimal bitmask (or numerical list if using “-c”) corresponding to the CPUs your processes are allowed to use. However, it’s not needed it your using all the processors unless you need them bound in a particular order.

On NUMA enabled systems, you should also investigate the use of “numactl”. Like taskset, you are able to bind a set of threads to a set of processors. However, NUMA generally does a better job of memory management. It doesn’t have as fine grain control as taskset so you can only assign threads by socket not CPU. This only matters for multi-core chips since with single-core chips the socket and CPU are analogous. For multi-core chips, you can use both “numactl” and “taskset” to get finer control.

With the recent releases of the PGI compilers (6.0 and newer) on both Linux and Windows, when linking with “-mp” option, the NUMA libraries are linked in with your application on systems which support NUMA. (With 6.0 you need to use “-mp=numa”). So instead of using “numactl” or “taskset”, when linked with “-mp” you can do the same thing by simply setting the environment varaibles “MP_BIND” and “MP_BLIST”. Setting “MP_BIND” to “yes”, tells the runtime to bind your threads to a set of processors. “MP_BLIST” is the list of processors to bind to and the order in which they are bound. For example “setenv MP_BLIST 7,5,3,1,6,4,2,0”, will bind your threads starting at CPU #7 (the 8th processor) and interleave them accross the rest of the CPUs. Unlike “numactl”, the grainularity is by CPU not socket.

Binding MPI threads to particular CPUs is a bit more complex, but “doable”. Let me know if you need help with this.

  • Mat

Thank you very much for your useful information. but i need more help from you. I am using CENT OS in my workstation. and PGI compiler is release 6.1 I hope this is the latest version. My machine is called as dual core machine. But when I checked the cpu info it shows as 8 cpus it has. I dont understand well about it. When i run MPI program it shows only one node is there.
I am confusing well. Pls help me where were the problem in my machine.
Thank you.

Sitha

i

Hi Sitha,

Multi-core chips now being produced by AMD and Intel, contain 2 or more CPUs on the same chip and connect to the motherboard via a single socket. Although the CPUs on these chip do share some components, they can logically be thought as distinct. So while you may have a 4 socket, dual-core system, you can logically think of it as having 8 distinct and separate CPUs.

When i run MPI program it shows only one node is there.

Do you have your system only listed once in the machine.LINUX file? It should be listed 8 times in this file or have a “:8” after its name, i.e. “systemname:8”.

  • Mat

Hi Mat,

I am interested in knowing how to bind MPI threads to CPUs. Would you please show us how to do it? Thanks.

-Winston

Hi Winston,

I was afraid someone would ask ;-) First, the cavet is that the process is highly dependent upon the flavor of MPI your working with. You will need to modify the following script and it may not work with all MPI implementations. Also, I’m assuming you’re using a 4 CPU SMP systems.

The basic idea is that you create a wrapper script to launch your application and use “taskset” to bind each process to an individual processor. The processor used is dependant upon a MPI environment variable. Using LAM/MPI as an example, we would start-up lamboot and use mpirun to start-up our script.

 % lamboot -v hostfile
 % mpirun -np 4 run_script

“run_script” uses the LAM environment variable “LAMRANK” to determine which process to run on which processor.

#!/bin/csh -f

if ("$LAMRANK" == "0") then
 echo "RANK 0...."
 taskset 0x1 a.out
else if("$LAMRANK" == "1") then
 echo "RANK 1...."
 taskset 0x4 a.out
else if("$LAMRANK" == "2") then
 echo "RANK 2...."
 taskset 0x10 a.out
else
 echo "OTHER RANKS...."
 taskset 0x14 a.out
endif
exit 0

Again, you’ll need to modify this for your individual needs and it may not work for all MPI versions.

Have Fun!
Mat

Hi Mat,

Thanks for the information.

Our application is using MPICH (1.2.6) on a dual dual-core Opteron cluster. I am wondering if you can provide some pointers on how to do this on MPICH set-up.

I was searching on the web and found Per Ekman’s work ( Linux NUMA stuff ) He posted code patch for MPICH/MPICH2 on ch_shmem device, but I thought what I need is the patch on ch_p4 device. I did not pursue that thread further.

Thanks again.

-Winston