Have you had a chance to try the toolchain from the CUDA 4.1 release candidate to see whether the issue persists?
Given that the code compiles with the host compiler, but causes problems when nvcc handles the code before passing it on to the host compiler, a problem with the CUDA compiler seems likely. It would be helpful if you could file a bug for this. Please attach a self-contained repro case, and state relevant system information (in case the problem is isolated to specific platforms). Thank you for your help.
Bugs can be submitted through the registered developer website, partners.nvidia.com. Once you log in, there is a column with menu of items on the left hand side. The third item from the top is “Bug Report”. Clicking on that opens a browser form for entering a bug into the bug data base. Each registered developer can check on the status of bugs they have filed.