register use growth when switche CUDA1.0->CUDA1.1

I wonder if anyone has experienced the same…

I switched from CUDA 1.0 to CUDA 1.1, compiled EXACTLY the same kernel with the new NVCC, and got 18 registers used instead of 16 as in CUDA 1.0.

This leads to a huge performance downgrade in my app…


I had the reverse. My kernels used 1 or 2 fewer registers when compiled with nvcc 1.1.

I compile in CUDA 1.0 envmt and ran in a CUDA 1.1 envmt (This one does NOT have SDK installed) and I got a run-time error saying “cuda.dll” missing. My app was linked against “cuda.lib” in the compilation environment as my code uses driver API. ANy thoughts?

cuda.dll is included in the display driver as of 1.1

And it is renamed to nvcuda.dll. So, you need to link against cuda.lib from 1.1.

Well, in LINUX it doesn’t work because it complains on driver version mismatch… had to downgrade to CUDA 1.0 to get proper results