I assume you mean CUDA_SAFE_CALL from cutil.h? tmurray will be along shortly with a baseball bat…
Anyway, that means that something failed. Can you get any of the SDK examples to run (both in release and debug modes)? Do you get any more error messages?
Your kernel probably has a segfault in it, and cudaThreadSynchronize would return unspecified launch failure if you were calling it. Since you’re not, cudaMemcpy is doing that instead.
Now, get rid of cutil and check your errors yourself with cudaGetLastError() and cudaGetErrorString().
Your kernel probably has a segfault in it, and cudaThreadSynchronize would return unspecified launch failure if you were calling it. Since you’re not, cudaMemcpy is doing that instead.