can't we have cudaPrintf()?


Suppose we have some data (in array a_d) in Global Memory resulted from some computation on GPU. Now to print this data on the screen we need to use cudaMemcpy(…DeviceToHost), and then we can use a for loop to print the content of the array. Is it possible to have a direct function cudaPrintf(…DeviceToHost), which will basically combine cudaMemcpy and usual printf() C function?
How can we do this ?



Till now we don’t have such kind of a support…
However, if you really want to debug your code, use the device emulation mode (-deviceemu option for nvcc). In this mode, you can use the famous ‘fprintf’ statements inside the device functions and kernels. Once you are sure that the code is OK, you could compile it without the -deviceemu.
Now, here’s a small ‘#define’ trick to do this:
In every device function, use the following group of statements:
fprintf(stdout, “This is my debug statement inside this function!\n”);

And, whenever you are passing -deviceemu option to nvcc, also pass another option saying -DDEVICE_EMULATION. You are done! :)

On windows you’ll soon get Nexus (google for it) and on linux you should do ok with cuda-gdb


A ‘cuprintf’ is supposed to be in the works. In some cases it will be very useful, but it’s likely to deluge you with information.

Even better… use my GPU Trace Library and get much more than just cudaPrintf()!