Passing CUdeviceptr to cuLaunchKernel in 32-bit app on Windows 7 64-bit

Driver version: 332.88
CUDA SDK version: 6.0
Compiler: MSVC 2010
OS: Windows 7 64-bit

I’ve been getting illegal memory accesses in my kernel and took a while to whittle the problem down to the following sequence of code

extern "C" __global__ void Kernel(uint* output)
{
   output[0] = 1
}
struct DeviceMem
{
   CUdeviceptr ptr;
};

void Run()
{
   // Initialise CUDA
   cuInit(0);
   CUdevice device;
   cuDeviceGet(&device, 0);
   CUcontext context;
   cuCtxCreate(&context, CU_CTX_SCHED_AUTO, device);

   // Load PTX compiled with nvcc and get its entry-point
   CUmodule module;
   cuModuleLoad(&module, "path.ptx");
   CUfunction function;
   cuModuleGetFunction(&function, module, "Kernel");

   // Allocate an arbitrarily sized chunk of device memory
   DeviceMem mem;
   cuMemAlloc(&ptr, 100);

   // Execute kernel with 
   void* args[MAX_ARGS];
   args[0] = &mem.ptr;
   cuLaunchKernel(function, 1, 1, 1, 1, 1, 1, 0, NULL, args, NULL);
}

I’ve omitted all error checking and such but the above code above gives me a CUDA_ERROR_ILLEGAL_ADDRESS every time.

This is a 32-bit program and I can see from the definition of CUdeviceptr that its size varies based on platform

#if defined(__x86_64) || defined(AMD64) || defined(_M_AMD64)
typedef unsigned long long CUdeviceptr;
#else
typedef unsigned int CUdeviceptr;
#endif

But it occurred to me; what if the driver API expected a 64-bit pointer (even from a 32-bit program)? Or even, what if the GPU expects a 64-bit pointer but the 32-bit CUDA driver was failing to zero the high 32-bits?

I changed DeviceMem to this

struct DeviceMem
{
   DeviceMem() : pad(0) { }
   CUdeviceptr ptr;
   u32 pad;
};

and the error went away!

So, is there a bug here or have I missed some documentation on this? Could I be linking with incorrect versions of the libraries? (I’ve double-checked and I’m definitely referencing the 32-bit import libs).

Cheers,

  • Don

Some things you could try:

  1. inspect your .ptx file and make sure it says ".address_size 32" near the top
  2. get rid of the DeviceMem struct and just use a CUdeviceptr
  3. check "sizeof(CUdeviceptr)" at run time to make sure you're really compiling for 32-bit
  4. double-check your Windows build to make sure it's 32-bit

If (1) winds up being the issue then make sure you compile your kernel with a “-m 32” switch.

Yes, DeviceMem is just there to highlight 32-bit vs 64-bit issue (sizeof does indeed match).

Was just coming back to post that I was not compiling with -m32 - that was the big issue here!

Thanks for taking time to read through.

Cheers,

  • Don