nvcc and gcc: It compiles and links, but fails at runtime

Hi All,

I’m trying to incorporate CUDA into an existing C project. I got it to compile and link, but CUDA doesn’t initialize properly at runtime. To isolate the problem, I’ve made up a simple example problem that has the same issue. Much of the CUDA-related code was cribbed from the NIVIDIA sample program deviceQuery.

My code is in two files. One, cudamain.c, is compiled by gcc as C code. The other is cudainit.c, compiled by nvcc as CUDA code (-x cu). They are linked by gcc, yielding the executable, named ci. Here’s the Makefile:

#############################################################################
ci : cudamain.o cudainit.o
	gcc -bind_at_load -L /Developer/NVIDIA/CUDA-6.5/lib -o ci cudamain.o cudainit.o -lcudart_static
cudamain.o	: cudamain.c
	gcc -c cudamain.c -o cudamain.o
cudainit.o    : cudainit.c 
	nvcc -x cu -I /Developer/NVIDIA/CUDA-6.5/include -c cudainit.c -o cudainit.o
clean:
	rm *.o
#############################################################################

(You may ask why I’m not using nvcc for all compiles and linking. It may come to that, I suppose, but when I first tried it I got a slew of unrecognized options. For example, nvcc didn’t recognize -bind_at_load, -Wall, -Wshadow, or -Wno-deprecated. So I gave up on that and am using nvcc for only the files that contain CUDA. If you can recommend a way to make nvcc recognize all my pre-existing options, I’d be open to that approach.)

The source code for cudamain.c is:

/***************************************************************************/
#ifdef __cplusplus
   extern "C" {
#endif

#include <stdio.h>
extern void CudaInit(void);

int main(int argc, char **argv)
{
      CudaInit();
      return(0);
}
#ifdef __cplusplus
   } /* extern "C" */
#endif
/***************************************************************************/

(Credit: I learned the extern “C” thing from the DevZone thread “Linking C and CUDA files with NVCC and GCC”.)

The source code for cudainit.c is:

/***************************************************************************/
#ifdef __cplusplus
   extern "C" {
#endif

//#define STANDALONE

#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>

#ifdef STANDALONE
   int main(int argc, char **argv)
#else
   void CudaInit(void)
#endif
{
      cudaError_t err = cudaSuccess;
      float *d_A = NULL;
      int deviceCount = 0;
      cudaError_t error_id = cudaGetDeviceCount(&deviceCount);

      if (error_id != cudaSuccess) {
        printf("cudaGetDeviceCount returned %d\n-> %s\n", (int)error_id, cudaGetErrorString(error_id));
        printf("Result = FAIL\n");
        exit(1);
      }

      if (deviceCount == 0) {
        printf("There are no available device(s) that support CUDA\n");
      }
      else {
        printf("Detected %d CUDA Capable device(s)\n", deviceCount);
      }
      int dev, driverVersion = 0, runtimeVersion = 0;
      dev = 0;
      cudaSetDevice(dev);
      cudaDeviceProp deviceProp;
      cudaGetDeviceProperties(&deviceProp, dev);

      printf("Device %d: \"%s\"\n", dev, deviceProp.name);
      cudaDriverGetVersion(&driverVersion);
      cudaRuntimeGetVersion(&runtimeVersion);
      printf("  CUDA Driver Version / Runtime Version          %d.%d / %d.%d\n", driverVersion/1000, (driverVersion%100)/10, runtimeVersion/1000, (runtimeVersion%100)/10);
      printf("  CUDA Capability Major/Minor version number:    %d.%d\n", deviceProp.major, deviceProp.minor);

      err = cudaMalloc((void **)&d_A,1000*sizeof(float));
      if (err == cudaSuccess) {
         printf("d_A successfully allocated.\n");
      }
      else {
         printf("Error allocating d_A: \n   %s\n",cudaGetErrorString(err));
      }
      #ifdef STANDALONE
         return(0);
      #endif
}

#ifdef __cplusplus
   } /* extern "C" */
#endif
/***************************************************************************/

If I compile cudainit.c with STANDALONE defined (using: nvcc -x cu -o ci cudainit.c) and run it, the output is:

Detected 1 CUDA Capable device(s)
Device 0: “GeForce GT 650M”
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 3.0
d_A successfully allocated.

So far, so good. But if I compile and link the program from two sources as described above, the output is:

cudaGetDeviceCount returned 35
→ CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

Obviously not the desired result, and I’m skeptical about the error message. What is really going on?

Thanks,
-DwP

I have gcc 4.8.2 (Fedora 20) and it doesn’t recognize -bind_at_load either. That appears to be unique to Apple/Mac/Darwin.

When you want to compile with options like that which are intended for the host compiler/linker, are you passing them directly to the host compiler using the -Xcompiler switch/prefix, like:

nvcc -Xcompiler=“-bind_at_load -Wall -Wshadow -Wno-deprecated” …

Anyway I can get your files to build and run properly with a few tweaks to the makefile (on Fedora 20), so I’m guessing the problem has some apple character to it, which I wouldn’t be able to discern.

At least on Linux, when I link against -lcudart_static with gcc/g++ instead of nvcc, I also have to link against -lcuda, or the linker gets angry. Maybe that’s not the case on the mac.

Yeah, I neglected to mention my platform. I’m on OSX 10.9.5. When I enter gcc -v, it says:

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

FWIW, I tried adding -lcuda, and it didn’t like it.

Thanks for the -Xcompiler option. I’ll try that.