Some troubles with cuda driver api in 64-bit mode

Hi everyone!

I’m beginner in cuda development. I wrote a simple cuda program, using CUDA driver API, that perfectly works. But only when I compile it as 32-bit application.

The main program looks like this:

#include<stdio.h>

#include<cuda.h>

int main()

{

  CUdevice dev;

  CUcontext context;

  CUmodule module;

  CUfunction func;

  CUdeviceptr d_data;

  float h_data[10];

  int N=10;

  for(int i=0;i<N;i++)h_data[i]=i+1;

  cuInit(0);

  cuDeviceGet(&dev,0);

  cuCtxCreate(&context,0,dev);

  cuModuleLoad(&module,"test.cubin");

  printf("Get function code: %d\n", cuModuleGetFunction(&func,module,"square"));

  cuMemAlloc(&d_data,sizeof(float)*N);

  cuMemcpyHtoD(d_data,h_data,sizeof(float)*N);

  cuParamSeti(func,0,d_data);

  cuParamSetSize(func,4);

  cuFuncSetBlockShape(func,N,1,1);

  cuLaunchGrid(func,1,1);

  cuMemcpyDtoH(h_data,d_data,sizeof(float)*N);

  for(int i=0;i<N;i++)printf("%d: %f\n", i, h_data[i]);

  return 0;

}

Here is the GPU program(test.cu)

extern "C" {

__global__ void square(volatile float* const a)

{

	a[threadIdx.x] = a[threadIdx.x]*a[threadIdx.x]; 

}

}

And here is my Makefile:

CUDAINC = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\include"

CUDALIB = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.0\lib\x64"

all:

		cl /c -I$(CUDAINC)  test.cpp

		link /LIBPATH:$(CUDALIB) test.obj cuda.lib

The test.cu file I compile with the followed command:

nvcc --cubin -m 32 -arch sm_21 test.cu

If I compile this as 32-bit application, the output is exactly as expected. But if I compile it as 64-bit application, the output is the same as input. In other words, the input is unchanged. Of course, I make some changes in makefile and GPU program I compile with another command(-m 64 instead of -m 32) Moreover, I make changes in source file:

cuParamSetSize(func,8) instead of cuParamSetSize(func, 4)…

So… Does anybody have any idea about what can I do wrong ?

Step one: check your errors. You can’t load a 32-bit cubin in a 64-bit application.

Step two: use cuLaunchKernel in 4.0. It’s ~1300x better than the old driver API means of launching a kernel.

Thank you very much for quick reply!
to step one:
Of course, I loaded 64-bit cubin in 64-bit application and all function calls returned 0…
to step two:
cuKernelLaunch works fine! So, thank you again :)