Some troubles with cuda driver api in 64-bit mode

Hi everyone!

I’m beginner in cuda development. I wrote a simple cuda program, using CUDA driver API, that perfectly works. But only when I compile it as 32-bit application.

The main program looks like this:



int main()


  CUdevice dev;

  CUcontext context;

  CUmodule module;

  CUfunction func;

  CUdeviceptr d_data;

  float h_data[10];

  int N=10;

  for(int i=0;i<N;i++)h_data[i]=i+1;





  printf("Get function code: %d\n", cuModuleGetFunction(&func,module,"square"));








  for(int i=0;i<N;i++)printf("%d: %f\n", i, h_data[i]);

  return 0;


Here is the GPU program(

extern "C" {

__global__ void square(volatile float* const a)


	a[threadIdx.x] = a[threadIdx.x]*a[threadIdx.x]; 



And here is my Makefile:

CUDAINC = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include"

CUDALIB = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\lib\x64"


		cl /c -I$(CUDAINC)  test.cpp

		link /LIBPATH:$(CUDALIB) test.obj cuda.lib

The file I compile with the followed command:

nvcc --cubin -m 32 -arch sm_21

If I compile this as 32-bit application, the output is exactly as expected. But if I compile it as 64-bit application, the output is the same as input. In other words, the input is unchanged. Of course, I make some changes in makefile and GPU program I compile with another command(-m 64 instead of -m 32) Moreover, I make changes in source file:

cuParamSetSize(func,8) instead of cuParamSetSize(func, 4)…

So… Does anybody have any idea about what can I do wrong ?

Step one: check your errors. You can’t load a 32-bit cubin in a 64-bit application.

Step two: use cuLaunchKernel in 4.0. It’s ~1300x better than the old driver API means of launching a kernel.

Thank you very much for quick reply!
to step one:
Of course, I loaded 64-bit cubin in 64-bit application and all function calls returned 0…
to step two:
cuKernelLaunch works fine! So, thank you again :)