Driver version: 332.88
CUDA SDK version: 6.0
Compiler: MSVC 2010
OS: Windows 7 64-bit
I’ve been getting illegal memory accesses in my kernel and took a while to whittle the problem down to the following sequence of code
extern "C" __global__ void Kernel(uint* output)
{
output[0] = 1
}
struct DeviceMem
{
CUdeviceptr ptr;
};
void Run()
{
// Initialise CUDA
cuInit(0);
CUdevice device;
cuDeviceGet(&device, 0);
CUcontext context;
cuCtxCreate(&context, CU_CTX_SCHED_AUTO, device);
// Load PTX compiled with nvcc and get its entry-point
CUmodule module;
cuModuleLoad(&module, "path.ptx");
CUfunction function;
cuModuleGetFunction(&function, module, "Kernel");
// Allocate an arbitrarily sized chunk of device memory
DeviceMem mem;
cuMemAlloc(&ptr, 100);
// Execute kernel with
void* args[MAX_ARGS];
args[0] = &mem.ptr;
cuLaunchKernel(function, 1, 1, 1, 1, 1, 1, 0, NULL, args, NULL);
}
I’ve omitted all error checking and such but the above code above gives me a CUDA_ERROR_ILLEGAL_ADDRESS every time.
This is a 32-bit program and I can see from the definition of CUdeviceptr that its size varies based on platform
#if defined(__x86_64) || defined(AMD64) || defined(_M_AMD64)
typedef unsigned long long CUdeviceptr;
#else
typedef unsigned int CUdeviceptr;
#endif
But it occurred to me; what if the driver API expected a 64-bit pointer (even from a 32-bit program)? Or even, what if the GPU expects a 64-bit pointer but the 32-bit CUDA driver was failing to zero the high 32-bits?
I changed DeviceMem to this
struct DeviceMem
{
DeviceMem() : pad(0) { }
CUdeviceptr ptr;
u32 pad;
};
and the error went away!
So, is there a bug here or have I missed some documentation on this? Could I be linking with incorrect versions of the libraries? (I’ve double-checked and I’m definitely referencing the 32-bit import libs).
Cheers,
- Don