I’m trying to debug a molecular modeling application, which requires calculating a large number of data points (~300k). In the process, I’d like to be able to get output from the GPU via printf calls. I’m redirecting the output to a file.
Unfortunately, the number of executed printf’s is limited to somewhere between 20k and 50k times (depending on the size of each line). If there are more, no output is generated. Is there a way around this limitation? I’m using CUDA 4.0 with 270.41.19 driver on CentOS 5.5 (GTX480).
The other alternative I know of (getting my numbers into an array on the GPU, cudaMemcpy it to the host and print out) is a major headache. Besides, if any error is thrown in the kernel, I get no output and no clue what happened.
Any advice would be much appreciated.