Hi All,
I’m trying to incorporate CUDA into an existing C project. I got it to compile and link, but CUDA doesn’t initialize properly at runtime. To isolate the problem, I’ve made up a simple example problem that has the same issue. Much of the CUDA-related code was cribbed from the NIVIDIA sample program deviceQuery.
My code is in two files. One, cudamain.c, is compiled by gcc as C code. The other is cudainit.c, compiled by nvcc as CUDA code (-x cu). They are linked by gcc, yielding the executable, named ci. Here’s the Makefile:
#############################################################################
ci : cudamain.o cudainit.o
gcc -bind_at_load -L /Developer/NVIDIA/CUDA-6.5/lib -o ci cudamain.o cudainit.o -lcudart_static
cudamain.o : cudamain.c
gcc -c cudamain.c -o cudamain.o
cudainit.o : cudainit.c
nvcc -x cu -I /Developer/NVIDIA/CUDA-6.5/include -c cudainit.c -o cudainit.o
clean:
rm *.o
#############################################################################
(You may ask why I’m not using nvcc for all compiles and linking. It may come to that, I suppose, but when I first tried it I got a slew of unrecognized options. For example, nvcc didn’t recognize -bind_at_load, -Wall, -Wshadow, or -Wno-deprecated. So I gave up on that and am using nvcc for only the files that contain CUDA. If you can recommend a way to make nvcc recognize all my pre-existing options, I’d be open to that approach.)
The source code for cudamain.c is:
/***************************************************************************/
#ifdef __cplusplus
extern "C" {
#endif
#include <stdio.h>
extern void CudaInit(void);
int main(int argc, char **argv)
{
CudaInit();
return(0);
}
#ifdef __cplusplus
} /* extern "C" */
#endif
/***************************************************************************/
(Credit: I learned the extern “C” thing from the DevZone thread “Linking C and CUDA files with NVCC and GCC”.)
The source code for cudainit.c is:
/***************************************************************************/
#ifdef __cplusplus
extern "C" {
#endif
//#define STANDALONE
#include <stdio.h>
#include <cuda.h>
#include <cuda_runtime.h>
#ifdef STANDALONE
int main(int argc, char **argv)
#else
void CudaInit(void)
#endif
{
cudaError_t err = cudaSuccess;
float *d_A = NULL;
int deviceCount = 0;
cudaError_t error_id = cudaGetDeviceCount(&deviceCount);
if (error_id != cudaSuccess) {
printf("cudaGetDeviceCount returned %d\n-> %s\n", (int)error_id, cudaGetErrorString(error_id));
printf("Result = FAIL\n");
exit(1);
}
if (deviceCount == 0) {
printf("There are no available device(s) that support CUDA\n");
}
else {
printf("Detected %d CUDA Capable device(s)\n", deviceCount);
}
int dev, driverVersion = 0, runtimeVersion = 0;
dev = 0;
cudaSetDevice(dev);
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, dev);
printf("Device %d: \"%s\"\n", dev, deviceProp.name);
cudaDriverGetVersion(&driverVersion);
cudaRuntimeGetVersion(&runtimeVersion);
printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n", driverVersion/1000, (driverVersion%100)/10, runtimeVersion/1000, (runtimeVersion%100)/10);
printf(" CUDA Capability Major/Minor version number: %d.%d\n", deviceProp.major, deviceProp.minor);
err = cudaMalloc((void **)&d_A,1000*sizeof(float));
if (err == cudaSuccess) {
printf("d_A successfully allocated.\n");
}
else {
printf("Error allocating d_A: \n %s\n",cudaGetErrorString(err));
}
#ifdef STANDALONE
return(0);
#endif
}
#ifdef __cplusplus
} /* extern "C" */
#endif
/***************************************************************************/
If I compile cudainit.c with STANDALONE defined (using: nvcc -x cu -o ci cudainit.c) and run it, the output is:
Detected 1 CUDA Capable device(s)
Device 0: “GeForce GT 650M”
CUDA Driver Version / Runtime Version 6.5 / 6.5
CUDA Capability Major/Minor version number: 3.0
d_A successfully allocated.
So far, so good. But if I compile and link the program from two sources as described above, the output is:
cudaGetDeviceCount returned 35
→ CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
Obviously not the desired result, and I’m skeptical about the error message. What is really going on?
Thanks,
-DwP