MPICH1 has been very slow and problematic with the -comm-shared switch and perhaps buggy too. MPICH2 is promising and works pretty well using the following “script”
pgi 64 bit compile
source /usr/people/profile/TOOLS_PROFILES/pgi_64.sh # PATH, licence daemon etc. #if/when required
CFLAGS="-fast " CXXFLAGS="-fast" FFLAGS="-fast " F90FLAGS="-fast " LDFLAGS="-fast -pgf90libs -lpgf90 -lpgf90_rpm1 -lpgf90rtl -lpgftnrtl" OPTFLAGS="-fast "
export CC=“pgcc” CXX=“pgCC” F90=“pgf90” FC=“pgf90” CPP=“pgCC -E”
./configure --enable-f90 --enable-mpe --prefix=/usr/mpich2_64 --enable-dependencies
However I have a few smp machines so I want tcp protocols between machines and share memory protocols between nodes on one machine. (LAM does this nicely with sysv but some serious issues when using certain programs)
Logically I tried to use the --with-device=ch3:ssm with configure and for fun I also tried --with-device=ch3:nemesis --> not compiling…
Here’s some info from the MPICH2 FAQ
"Q: When building the ssm or sshm channel, I get the error ``mpidu_process_locks.h:234:2: error: #error *** No atomic memory operation specified to implement busy locks ***’’
The ssm and sshm channels do not work on all platforms because they use special interprocess locks (often assembly) that may not work with some compilers or machine architectures. They work on Linux with gcc, Intel, and Pathscale compilers on various Intel architectures. They also work in Windows and Solaris environments. "
I sniffed around and often the criteria is similar to what’s shown below ( checks for swap functions, compare and swap etc.):
asm volatile (“lock ; incl %0”
asm volatile (“lock ; incq %0”
asm volatile (“fetchadd4.rel %0=[%1],%2”
: “=r”(val) : “r”(ptr), “i” (1)
#error No fetch-and-add function defined for this architecture
------------- ($MPICH2SRC/src/mpid/ch3/channels/nemesis/nemesis/include/mpid_nem_atomics.h , an interesting file when using nemesis - no good examples for ssm but similar. From what I understand ssm “supports” shared communications and tcp/ip while nemesis is superior in that it supports myrinet and is faster(?) )
I would appreciate some hints here so that I can modify this code to use PGI compilers on either Opterons or Xeons(newish - EM64T).