After the upgrade from CUDA 3.0 to CUDA 3.1 I’ve figured out that my kernel when compiled for sm_20 architecture consumes 54 registers instead of 45 as it used to under CUDA 3.0. The number of registers for sm_13 architecture remains the same.
Also, it is compiled with the following warnings:
\3.1_64\toolkit\include\common_functions.h(73): warning: dllexport/dllimport conflict with “printf”
1>C:\Program Files (x86)\Microsoft Visual Studio 8\VC\INCLUDE\stdio.h(278): here; dllimport/dllexport dropped
What can be done about all this mess ? It looks like device-side printf() requires a lot of resources (as cuPrintf() does), however, it was easy to exclude cuPrintf from the build and use it only when it is actually required …
Thanks in advance.