I’m just getting started with CUDA and wrote a simple kernel to just set a variable and read it back on the host.
The result of the variable was unchanged, so something was obviously wrong. I added the following code after my kernel call
CUT_CHECK_ERROR( “calculate_desolvation_gpu_kernel failed” );
but it did not report any error. Furthermore, when debugging in deviceemu the debugger would simply step over the kernel call and breakpoints inside the kernel would not trip.
Once I reduced the thread-per-block under 512 everything worked fine - I could step inside the kernel and read the memory back properly (both in deviceemu and on hardware).
Is this a problem with CUDA? Should it be reporting an error here? Or is my method of tracking errors incorrect?
My system is RHEL, 2x 8800 GTX running CUDA 1.0 from June 2007