cudaMemcpy(..., cudaMemcpyDeviceToHost) not working?

The statement

cutilSafeCall (cudaMemcpy (h_output, d_output, 2 & MEM_SIZE,

								cudaMemcpyDeviceToHost));

doesn’t work as expected on my system. I’m a newbie at CUDA programming so my problem might be a conceptual difficulty rather than a bug in my setup, especially as my test code is so small and simple.

My environment: x86-64 RedHat EL5.2, fresh install of CUDA 2.3, Tesla C1060 GPU. All the SDK examples built correctly and a representative selection of them ran perfectly, AFAICT.

The test program is attached. It was developed by copying and editing the SDK template application. Two earlier attempts to include it in a codebox aborted the post and dumped me at the forum’s main page. The kernel just writes constant bytes to the global memory addressed by a pointer in its only argument. The calling routine prints out the contents of the buffer — which is always zeroed despite what the kernel is supposed to do.

The attached code is a stripped-down version of something which undoubtedly executes the kernel. The full version performs significant computation and timing shows it scalling appropiately with the number of threads it is expected to execute and with the size of internal loop variables.

Can anyone help me?

Thanks, Paul

Are you sure that bitwise and you are using is what you want? If MEM_SIZE is anything other than 2, that will return 0, if I am not mistaken…