Surface reference and OutOfRangeLoad exception with CUDA 6.5


I have been using surface reference reads and writes for a while and now switched to CUDA 6.5. I get these strange OutOfRangeLoad exceptions when the .cu files that use surf3dwrite are compiled with debug symbols. I can tell that surf3dwrite calls are not doing anything either. If I don’t generate any debug information, I don’t get any exceptions or problems. Everything works fine.

I did not have this problem with version 5.0.

What has changed???


Have you determined whether your code actually performs out-of-range read accesses? Does the code check the return status of every CUDA API call? I am unfamiliar with your scenario but would guess that an out-of-range access could also be the consequence of a failed surface setup. When you run your application under cuda-memcheck, are any errors reported?

I checked all host-side calls. All return codes for cudaGraphicsGLRegisterImage, cudaGraphicsMapResources, cudaGraphicsSubResourceGetMappedArray, cudaBindSurfaceToArray are cudaSuccess.

I checked the indices that it tries to write, they are legit too. I double checked the size of the array that is bound to the surface through the Warp info window in VS, that is legit too.

cuda-memcheck.exe does not reveal anything but the cuda debugger complains about “Memory Checker detected xxx access violations on load (global memory).”

Debugger stops at file device_functions.h:

static forceinline int __float_as_int(float x)
return __nv_float_as_int(x);

I’m using surf3dwrite(float, surf_ref, int, int, int) functions. The compute is set to 2.0 for the file. If I use compute 30, even the compile without debug information stops working.

I also tried templating the surface call myself by surf3dwrite(…), that did not work either.

Strange. I would have thought that the out-of-bounds detection mechanism built into cuda-memcheck is identical to the detection mechanism built into the debugger.

I am not aware of any false positives being reported by either tool, so if the debugger complains about an out-of-bounds access, it is exceedingly likely that there is an out-of-bounds access which may go undetected in release builds. While some issue with the CUDA software stack cannot be excluded at this time, a latent bug in your code now exposed by CUDA 6.5 seems more likely.

If you could post a minimal complete example code that reproduces the problem, it would enable others to look into the issue. I perceive no way to debug such issues from cursory descriptions of the code.

I’d think the same yet everything works fine with no debug information added during compile.

I narrowed down the problem. I have two similar surface references, one to a single channel 32bit float texture, and one single channel 32bit unsigned int texture. Code seems to fail at the surf3dwrite calls on the unsigned int texture but not on the float texture.

The texture is defined as:

glTexImage3D(GL_TEXTURE_3D, 0, GL_R32UI, w, h, d, 0, GL_RED_INTEGER, GL_UNSIGNED_INT, data);

I defined the channel descriptor with:

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc();

and I use the following to write:

surf3Dwrite(value, surface<void,…>, ix*sizeof(unsigned int), iy, iz);

It seems, it is a problem with the SDK and Nvidia on it now.