'compute_35,sm_35' Code Generation in Release Build Causing "CUDA Runtime API error 30: unknown

I am using Visual Studio 2010 with Nsight 3.0 on Windows 7 64-bit computer. The GPU I am using is k20c and another low end quodro card with no CUDA capability. In the “Code Generation” settings under project property->CUDA C/C+±>Device, if I put “compute_10,sm_10;compute_20,sm_20;compute_30,sm_30”, it ran fine. But If I put “compute_35,sm_35” in, I got a “CUDA Runtime API error 30” error. Strangely the debug version ran fine in all generations.

Since k20 is sm3.5, I should really set it up to have “compute_35,sm_35” working. I ran memcheck with the debug version and it did not report any errors.

If I comment out a small portion of the code, the release version runs, but there is no reason that the code would not run with that part of the code. I simplified the code and it is very simple.

When I comment out a smaller portion of the code, I got a crash dialogbox “cicc.exe stopped working”. This is very likely a compiler bug.

Attached is the code. Please search “NOTE:” in .cuh file to see where the portion of code was commented out for it to run and where the portion of the code that causes crash.

I have the same problem,

I am trying to get an answer . . . .

The bug is fixed in CUDA 5.5 RC. You can get it from CUDA RDP.