CUDA Emulator corrupting 3D array

Hi all!

So far my experience of CUDA has been wonderful and I have realised vast speedups in my applications. I started using 3D memory allocations (via cudaMemcpy3D) the other day, and ever since my experience has declined rapidly.

At first I found little and conflicting documentation about cudaMalloc3D and similar 3D elements (cudaMemcpy3D, cudaMemset3D, cudaPitchedPtr, cudaExtent). I am writing in C++ under visual studio and so cannot use constructors such as make_cudaExtent and make_cudaPitchedPtr (these appear to only be available in .cu files but nowhere is this documented). Fortunately I have managed to successfully allocate my 3D array via:

// create the three dimensional pose network on the GPU

extent.width = 64 * sizeof(float); extent.height = 64; extent.depth = 32;

cudaMalloc3D(&pose_network, extent);

cudaMalloc3D(&pose_network_swap, extent);

// Initialise the network to some arbitrary energy

cudaMemset3D(pose_network, 8, extent);

cudaMemset3D(pose_network, 8, extent);

However, when debugging my GPU kernel when using the emulator (since I can’t debug a kernel actually on my GPU without Nexus - which isn’t supported) when I dereference any element in the pose_network array, the result is 0.000000 when cast to a float. Code below:

__global__ void translate(cudaPitchedPtr pitched_pose_network, cudaPitchedPtr pitched_pose_network_swap, cudaExtent extent, float v) {

register float reference;

// calculate the target cell reference

  unsigned int x = threadIdx.x;

  unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;

  unsigned int z = blockIdx.x;

float* pose_network = (float*)pitched_pose_network.ptr;

  float* pose_network_swap = (float*)pitched_pose_network_swap.ptr;

size_t pitch = pitched_pose_network.pitch / sizeof(float*);

  size_t layer_pitch = pitch * extent.height;

// dereference

  reference = pose_network[z * layer_pitch + y * pitch + x];

The other peculiar thing about my situation is that when debugging, pose_network never seems to be allocated. It is always 0x00000000 even after

float* pose_network = (float*)pitched_pose_network.ptr;

Also when I use the watch 1 feature on x, y and z, it says they do not exist within the scope of the stack frame for the entire duration of the kernel.

Has anyone had random issues with the emulator before, such as corruption of variables or inability to access arrays? I need to get this fixed before writing any further code. I cannot write a hundred lines of code on the GPU and just assume it will work correctly, but a debugger that produces crap isn’t much better.

Any ideas would be greatly appreciated

Don’t use cudaMemset3D … it is not reliable. Use cudaMemset instead.…hl=cudamemset3d…hl=cudamemset3d

Hi CapJo!

Thanks for your reply. I saw the other thread you posted in, and I’ll give your suggestion a shot. I’ll report back if/when I get it working.

Have you ever had random issues with the emulator? When I step through code in visual studio, sometimes variables appear to not get set when assigned to. Other times, variables that I pass into a kernel “appear” to get corrupted, but act as they should. I have especially found, in my experience, that C style wrappers around kernel invocations should ONLY contain the invocation and no assigning to variables or constructing dim_grid and dim_block values, because these DO get corrupted.

Do people have universal problems now and again with the emulator, or should I be checking my installation and setup carefully?



It seems that your problem is compiler optimization. Compiler optimization leads to a reordering and removement of your instructions in your code. When you debug your code it then seems that instructions are not executed.

Try to disable compiler optimzation in “C/C++ -> Optimization -> Optimization -> Disabled (/Od)” and in your CUDA build rule “Optimization -> Disabled (/Od)”.

Until now I had only problems with cudaMemset3D on the device. The emulation mode worked correctly for me, but there might also be be bugs in emulation mode and they won’t be fixed anymore, since the emulation mode will be discontinued in the next release of CUDA (Version 3.0).

The same happened to me (corrupted variables and crashing) when running on emulation mode, and the solution was, as CapJo says, disabling all kind of optimization.

About discontinuing the emulation mode in future versions of CUDA, I think it will be very sad for some of us, since I don’t have a CUDA-capable video card at home, so I must develop and debug in emulation mode, and latter test my work in one of the scarce computers around that do have a CUDA-capable GPU, driving me to one the following: keep working with previous releases that support emulation mode, or jump to something else than CUDA :(

The last information is that CUDA release 3.0 will have emulation mode, but the release after that won’t support emulation mode.

I have two additonal questions:

First, did turning off optimization fix the problem ?
Second, I cant figure out 3D arrays at all, is there any other documentation or could you possibly explain
the code that you have so helpfully provided ?
I thought pitchptrs were for 2D arrays ?

I guess that is three questions.