MPICH2 - share memory version catastrophe

Hi

MPICH1 has been very slow and problematic with the -comm-shared switch and perhaps buggy too. MPICH2 is promising and works pretty well using the following “script”

#!/bin/bash

pgi 64 bit compile

source /usr/people/profile/TOOLS_PROFILES/pgi_64.sh # PATH, licence daemon etc. #if/when required

CFLAGS="-fast " CXXFLAGS="-fast" FFLAGS="-fast " F90FLAGS="-fast " LDFLAGS="-fast -pgf90libs -lpgf90 -lpgf90_rpm1 -lpgf90rtl -lpgftnrtl" OPTFLAGS="-fast "
export CC=“pgcc” CXX=“pgCC” F90=“pgf90” FC=“pgf90” CPP=“pgCC -E”
./configure --enable-f90 --enable-mpe --prefix=/usr/mpich2_64 --enable-dependencies
make all

However I have a few smp machines so I want tcp protocols between machines and share memory protocols between nodes on one machine. (LAM does this nicely with sysv but some serious issues when using certain programs)

Logically I tried to use the --with-device=ch3:ssm with configure and for fun I also tried --with-device=ch3:nemesis --> not compiling…

Here’s some info from the MPICH2 FAQ
"Q: When building the ssm or sshm channel, I get the error ``mpidu_process_locks.h:234:2: error: #error *** No atomic memory operation specified to implement busy locks ***’’
The ssm and sshm channels do not work on all platforms because they use special interprocess locks (often assembly) that may not work with some compilers or machine architectures. They work on Linux with gcc, Intel, and Pathscale compilers on various Intel architectures. They also work in Windows and Solaris environments. "

I sniffed around and often the criteria is similar to what’s shown below ( checks for swap functions, compare and swap etc.):

#ifdef HAVE_GCC_AND_PENTIUM_ASM
asm volatile (“lock ; incl %0”
:"=m" (*ptr)
:“m” (*ptr));
return;
#elif defined(HAVE_GCC_AND_X86_64_ASM)
asm volatile (“lock ; incq %0”
:"=m" (*ptr)
:“m” (*ptr));
return;
#elif defined(HAVE_GCC_AND_IA64_ASM)
int val;
asm volatile (“fetchadd4.rel %0=[%1],%2”
: “=r”(val) : “r”(ptr), “i” (1)
: “memory”);
return;
#else
#error No fetch-and-add function defined for this architecture
#fi
------------- ($MPICH2SRC/src/mpid/ch3/channels/nemesis/nemesis/include/mpid_nem_atomics.h , an interesting file when using nemesis - no good examples for ssm but similar. From what I understand ssm “supports” shared communications and tcp/ip while nemesis is superior in that it supports myrinet and is faster(?) )

I would appreciate some hints here so that I can modify this code to use PGI compilers on either Opterons or Xeons(newish - EM64T).

Thanks,
CB

Hi CB,

Please try modifing the base directory’s “configure” script at lines 24395 and 24464 to the following

line 24395:

if test "$ac_cv_c_compiler_gnu" = "yes" -o "$ac_cv_prog_CC" = "icc" -o "$ac_ct_CC" = "pgcc" ; then

line 24464:

# check for x86_64
if test "$ac_cv_c_compiler_gnu" = "yes" -o "$ac_ct_CC" = "pgcc"; then
echo "$as_me:$LINENO: checking for gcc __asm__ and AMD x86_64 cmpxchgq instruction" >&5
echo $ECHO_N "checking for gcc __asm__ and AMD x86_64 cmpxchgq instruction... $ECHO_C" >&6
  • Mat

Hi Mat

Thanks, I tried this out but

PGC-F-0249-#error – *** No atomic memory operation specified to implement busy locks *** (./mpidu_process_locks.h: 264)

Look at the config output shows:

checking for gcc asm and pentium cmpxchgl instruction… no
checking for gcc asm and AMD x86_64 cmpxchgq instruction… no
checking for x86 mfence instruction using asm… no
checking for x86 sfence instruction using asm… no
checking for x86 lfence instruction using asm… no
checking for x86 mfence instruction using __asm… no
checking for x86 sfence instruction using __asm… no
checking for x86 lfence instruction using __asm… no
checking for x86 mfence instruction using asm()… no
checking for x86 sfence instruction using asm()… no
checking for x86 lfence instruction using asm()… no

,while if I use gcc and pgf90 I get

checking for gcc asm and pentium cmpxchgl instruction… no
checking for gcc asm and AMD x86_64 cmpxchgq instruction… yes
checking for gcc asm and IA64 xchg4 instruction… nochecking for x86 mfence instruction using asm… yes
checking for x86 sfence instruction using asm… yes
checking for x86 lfence instruction using asm… yes
checking for x86 mfence instruction using __asm… no
checking for x86 sfence instruction using __asm… no
checking for x86 lfence instruction using __asm… no
checking for x86 mfence instruction using asm()… no
checking for x86 sfence instruction using asm()… yes
checking for x86 lfence instruction using asm()… no

So the test is still being failed.
btw the gcc/pgf90 version makes and installs but fails tests most likely because of lack of library specification. (will play around here)

Any other ideas? Willing to try anything, well almost… ;)

Cheers,
CB

Hi CB,

My fault. I forgot to mention that we just added support this asm extension in 6.2-4. You’ll need to upgrade the compiler.

  • Mat