MPICH linking failing

Hello,
I’m trying to build an application with PGI. It makes use of mvapich2 and hdf5. I’ve compiled as suggested here: http://www.pgroup.com/resources/hdf5/hdf5_2012.htm and http://www.pgroup.com/resources/mvapich/mvapich_2011.htm.
When I try to build I get this error:

pgcpp  -Mcudax86 -c  CudaMasterFile.cu
"MultipleDeviceHandling.cu", line 409: warning: variable "Errorout" was set
          but never used
  	cudaError_t Errorout; // Errorstring
  	            ^

NOTE: your trial license will expire in 13 days, 7.58 hours.
/usr/local/mpich2/bin/mpic++   -O3   -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_BSD_SOURCE   -I/usr/local/mpich2/include  -I/opt/pgi/linux86-64/2013/cuda/5.0/include -I/usr/local/include -I/usr/include/   -c *.c           
ArrayHandling.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "ArrayHandling.c".
Compilation terminated.
DecompUtils.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "DecompUtils.c".
Compilation terminated.
PrintLog.c:
NOTE: your trial license will expire in 13 days, 7.58 hours.
bitoperations.c:
NOTE: your trial license will expire in 13 days, 7.58 hours.
cpml_help.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "cpml_help.c".
Compilation terminated.
fdtd.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "fdtd.c".
Compilation terminated.
grid.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "grid.c".
Compilation terminated.
lorentz_help.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "lorentz_help.c".
Compilation terminated.
outs.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "outs.c".
Compilation terminated.
populate.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "populate.c".
Compilation terminated.
region_distance.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "region_distance.c".
Compilation terminated.
rotate_help.c:
"/opt/pgi/linux86-64/2013/cuda/5.0/include/host_defines.h", line 128: catastrophic error: 
          #error directive: --- !!! UNKNOWN COMPILER: please provide a CUDA
          compatible definition for '__align__' !!! ---
  #error --- !!! UNKNOWN COMPILER: please provide a CUDA compatible definition for '__align__' !!! ---
   ^

1 catastrophic error detected in the compilation of "rotate_help.c".
Compilation terminated.
tools.c:
NOTE: your trial license will expire in 13 days, 7.58 hours.
make: *** [../obj/fdtd] Error 2

Could anybody please give me a hint?
Thanks

Looks like you’re missing the CUDA x86 flag “-Mcudax86” from your build.

  • Mat

Thanks for your answer.
In my original post the first line is:

pgcpp  -Mcudax86 -c  CudaMasterFile.cu

So I thought that maybe you mean that mpic++ also needs the -Mcudax86 flag so I’ve included it:

/usr/local/mpich2/bin/mpic++ -Mcudax86 -O3 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_BSD_SOURCE -I/usr/local/mpich2/include  -I/opt/pgi/linux86-64/2013/cuda/5.0/include -I/usr/local/include
 -I/usr/include -c *.c

Unfortunately the results are still the same.
Am I confused about where to put the flag?

Hi rvasquez,

Which version of the compiler are you using? We didn’t add CUDA 5.0 to CUDA x86 until the 13.9 release. Prior versions need to used CUDA 4.2.

  • Mat

I’ve upgraded, recompiled everything and tried to build again, and now, even when I receive different errors, it still seems like CUDA is not available
Command failing:

/usr/local/bin/mpic++  -O3   -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_BSD_SOURCE   -I/usr/local/include  -I/opt/pgi/linux86-64/2013/cuda/5.0/include -I/usr/local/include -I/
usr/include/    DecompUtils.o PrintLog.o bitoperations.o  tools.o ArrayHandling.o  region_distance.o cpml_help.o CudaMasterFile.o   grid.o lorentz_help.o populate.o    outs.o  r
otate_help.o  fdtd.o  -o ../obj/fdtd -L/usr/local/lib -L/opt/pgi/linux86-64/2013/cuda/5.0/lib64 -L/opt/pgi/linux86-64/13.8/lib -L/usr/local/lib /usr/local/lib/libhdf5_hl.a /usr/
local/lib/libhdf5.a -lz -lm -Wl,-rpath -Wl,/usr/local/lib -L/usr/lib/ -lfftw3 -lz -lm

Output:

DecompUtils.o: In function `initCUDA_MPI(int, char **, int *, my_grid *)':
/home/cuda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:38: undefined reference to `cuInit'
/home/cuda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:38: undefined reference to `cudaGetDeviceCount'
/home/cuda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:79: undefined reference to `cuDeviceGet'
/home/cuda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:79: undefined reference to `cuCtxCreate_v2'
DecompUtils.o: In function `createGPUDecompColumnComm2(int *, int)':
/home/uda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:236: undefined reference to `cudaGetDeviceCount'
CudaMasterFile.o: In function `__sti___17_CudaMasterFile_cu_f25416f6':
CudaMasterFile.cu:(.text+0x46): undefined reference to `__cudaRegisterVar'
CudaMasterFile.cu:(.text+0x79): undefined reference to `__cudaRegisterVar'
CudaMasterFile.cu:(.text+0xac): undefined reference to `__cudaRegisterVar'
CudaMasterFile.cu:(.text+0xdf): undefined reference to `__cudaRegisterVar'
CudaMasterFile.cu:(.text+0x112): undefined reference to `__cudaRegisterVar'
CudaMasterFile.o:CudaMasterFile.cu:(.text+0x145): more undefined references to `__cudaRegisterVar' follow
CudaMasterFile.o: In function `__sti___17_CudaMasterFile_cu_f25416f6':
CudaMasterFile.cu:(.text+0x981): undefined reference to `__cudaRegisterFunction'
CudaMasterFile.cu:(.text+0x9c5): undefined reference to `__cudaRegisterFunction'
CudaMasterFile.cu:(.text+0xa09): undefined reference to `__cudaRegisterFunction'
CudaMasterFile.cu:(.text+0xa4d): undefined reference to `__cudaRegisterFunction'
CudaMasterFile.cu:(.text+0xa91): undefined reference to `__cudaRegisterFunction'
CudaMasterFile.o:CudaMasterFile.cu:(.text+0xad5): more undefined references to `__cudaRegisterFunction' follow
CudaMasterFile.o: In function `start_load_E__FPfiN324dim3Pv':
/home/cuda/Code/bcalmrepo/CUDA_code/./kernel_help.cu:56: undefined reference to `blockIdx'
/home/cuda/Code/bcalmrepo/CUDA_code/./kernel_help.cu:60: undefined reference to `blockIdx'
CudaMasterFile.o: In function `load_E__FPA129_fiN324dim3Pv':
/home/cuda/Code/bcalmrepo/CUDA_code/./kernel_help.cu:74: undefined reference to `blockIdx'
/home/cuda/Code/bcalmrepo/CUDA_code/./kernel_help.cu:78: undefined reference to `blockIdx'
CudaMasterFile.o: In function `load_EBuffer__FPA129_fiN224dim3Pv':
/home/cuda/Code/bcalmrepo/CUDA_code/./kernel_help.cu:91: undefined reference to `blockIdx'
CudaMasterFile.o:/home/cuda/Code/bcalmrepo/CUDA_code/./kernel_help.cu:99: more undefined references to `blockIdx' follow
CudaMasterFile.o: In function `__GBL_kernelBuffer_E(int, dim3, void *)':
...

It’s very strange though.
Can anyone help me again? Thanks

Hi rvasquez,

Can you try adding the “-Mcudax86” flag to your link? Without it, the compiler doesn’t know it needs to add the CUDA x86 libraries to the link.

Also, you probably should remove the “-L/opt/pgi/linux86-64/13.8/lib” options. Since you’re using 13.9 now, this may cause you to pick-up the wrong libs.

  • Mat

Hello,
Thanks a lot for your answer, unfortunately I still see some errors in the standard cuda calls

/usr/local/bin/mpic++  -Mcudax86 -O3   -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_BSD_SOURCE   -I/usr/local/include  -I/opt/pgi/linux86-64/2013/cuda/5.0/include -I/usr/local/include -I/usr/include/    DecompUtils.o PrintLog.o bitoperations.o  tools.o ArrayHandling.o  region_distance.o cpml_help.o CudaMasterFile.o   grid.o lorentz_help.o populate.o    outs.o  rotate_help.o  fdtd.o  -o ../obj/fdtd -L/usr/local/lib -L/opt/pgi/linux86-64/2013/cuda/5.0/lib64 -L/usr/local/lib /usr/local/lib/libhdf5_hl.a /usr/local/lib/libhdf5.a -lz -lm -Wl,-rpath -Wl,/usr/local/lib -L/usr/lib/ -lfftw3 -lz -lm

Output:

DecompUtils.o: In function `initCUDA_MPI(int, char **, int *, my_grid *)':
/home/cuda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:38: undefined reference to `cuInit'
/home/cuda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:79: undefined reference to `cuDeviceGet'
/home/cuda/Code/bcalmrepo/CUDA_code/./DecompUtils.c:79: undefined reference to `cuCtxCreate_v2'
CudaMasterFile.o: In function `EnableP2P':
/home/cuda/Code/bcalmrepo/CUDA_code/./MultipleDeviceHandling.cu:158: undefined reference to `cudaDeviceEnablePeerAccess'
/home/cuda/Code/bcalmrepo/CUDA_code/./MultipleDeviceHandling.cu:172: undefined reference to `cudaDeviceEnablePeerAccess'
outs.o: In function `intialize_outzones(char *, my_grid *, out_zone *)':
/home/cuda/Code/bcalmrepo/CUDA_code/./outs.c:770: undefined reference to `cudaStreamCreateWithFlags'
fdtd.o: In function `main':
/home/cuda/Code/bcalmrepo/CUDA_code/./fdtd.c:322: undefined reference to `cudaDeviceSynchronize'
/home/cuda/Code/bcalmrepo/CUDA_code/./fdtd.c:360: undefined reference to `cudaDeviceSynchronize'
/home/cuda/Code/bcalmrepo/CUDA_code/./fdtd.c:360: undefined reference to `cudaDeviceReset'
/usr/local/lib/libhdf5.a(H5PL.o): In function `H5PL__open':
/home/cuda/Downloads/hdf5-1.8.11/src/./H5PL.c:523: undefined reference to `dlopen'
/home/cuda/Downloads/hdf5-1.8.11/src/./H5PL.c:535: undefined reference to `dlsym'
/home/cuda/Downloads/hdf5-1.8.11/src/./H5PL.c:552: undefined reference to `dlerror'
/usr/local/lib/libhdf5.a(H5PL.o): In function `H5PL__search_table':
/home/cuda/Downloads/hdf5-1.8.11/src/./H5PL.c:636: undefined reference to `dlsym'
/usr/local/lib/libhdf5.a(H5PL.o): In function `H5PL__close':
/home/cuda/Downloads/hdf5-1.8.11/src/./H5PL.c:657: undefined reference to `dlclose'
make: *** [../obj/fdtd] Error 2

Hmm. Did your part of your code get compiled with CUDA C instead of CUDA X86?

  • Mat

I don’t think so. We don’t even have nvcc installed on the machine. This one has been specifically using only pgcc or pgcpp.

It seems like those functions are not implemented for C++ although they appear in the Fortran API (http://www.pgroup.com/doc/pgicudaforug.pdf)
Or is there something else?
–R

Hi R,

I should have looked at the symbol names a bit closer. CUDA X86 only supports the CUDA runtime API not the CUDA Driver API. Hence, the “cu…” routines are not supported.

For the “dl…” symbols, try adding the “-ldl” library.

The rest of the Runtime API routines, i.e. “cuda”, are all newer CUDA 5.0 calls. CUDA X86 only fully supports CUDA 3.2 routines. Currently, there are no plans to update CUDA x86 to newer versions of CUDA, though that may change given that we are now part of NVIDIA.

  • Mat

Hello Mat

Thanks for your help. After writing several conditions for compilation we’re able to run the project, the only pending issue is that we can not use va_start due to the undefined __builtin_va_start error. I’ve added the -nobuild flag (or something like this) to compile but it’s still throwing the error.

Another question: The performance of our code, which is a simple finite difference algorithm is really bad. Which optimizations regarding thread block size and streams do you recommend?

Thanks in advance.

For the va_start error I’m not sure why it’s occurring but what I’d start with is to remove all the “-I” paths from your compile. My best guess is that you’re picking up the wrong stdarg.h file which is defining va_start to be a bulitin type.

performance of our code, which is a simple finite difference algorithm is really bad.

Compared to what? If you’re comparing to a GPU, then this would be expected. CUDA x86 runs in emulation on the host using OpenMP style threading. It’s made for portability not performance.

  • Mat