I encountered a weird problem using printf in kernel function. As long as I use printf in the kernel function, my code cannot allocate device memory. It generates error message “unspecified driver error” by using cudaGetErrorString(). I really have no idea why memory allocation and use of printf are related. I am using Tesla C2050 with sm20 and CUDA3.2 toolkit. Any clue? Thanks!
There should be no issues using device-side malloc() and printf() in the same kernel (provided you are using an sm_2x device, which you are). Would it be possible to post a small, self-contained, repro app that demonstrates the problem you are encountering?
It is hard to post self-contained code because it is a part of large source code.
I did a simple test before, and the printf worked fine, but somehow not in my current code. I got to look into it further.
Once you get it wittled down and it looks like a genuine problem in CUDA at that point, I would recommend filing a bug.
Will do, if this is confirmed a bug
But I do suspect there is something wrong in my code or compiler options
Because now I cannot call double precision math functions, such as pow() and exp() in device functions. The compiler says “calling a host function from a device/global function is not allowed”. It looks like the compiler takes pow() or exp() as host functions.
I can use powf() amd expf() though, and using log() which is also double precision function also passed compiling.
Solved the problem of calling double precision intrinsic functions in device function. I had to cast both arguments in pow(x,y) into double precision, otherwise it looks like the nvcc compiler will take the host pow() and then lead to the compiler error.