compilation error with templated kernel


I am using Visual 2010 and NVCC under WinXP32.

I am trying to use a expression templates library within a cuda kernel. (actually it’s a bit more complicated, but that’s the idea)

I setted some “host device” keywords where the operators of the expession templates are evaluated in the library.

And I use a simple templated kernel, something like:

template<class Expr>

__global__ void expr_launch(int n, float *dest, Expr expr)


 for (int i=blockDim.x*blockIdx.x+threadIdx.x, di=blockDim.x*gridDim.x; i<n; i+=di)



It works, but only for very trivial and small expression templates.

But if the expression gets slightly bigger (therefore with deeper templates), I get a compilation error.

It seems that the error occurs if more than 3 templates are nested together.

I tried many things to fix the compilation error, but without any success.

Do you have any idea how to fix it?

Is it a bug from nvcc???

Thanks in advance for your answers. Pat

I am a newbie with cuda, I progressed in the comprehension of the problem.
It really seems to be a bug coming from NVCC.

I added the “-keep” option to the command line of NVCC to see the generated CPP file.
The problem is that the template argument duduced from NVCC is not the same for Visual.
The template argument deduced by NVCC does not have any “const” qualifier, whereas Visual (and Intel C++ compiler, and GCC) lets the “const”.

I’ll try to set some “#ifdef” in the library, so that the “const” qualifier won’t be propagated in the expression templates whenever compiled with NVCC.
But it clearly seems to be a bug of NVCC

I confirm: NVCC does not deduce well templates with ‘const’ qualifiers.

I added “#ifdef CUDACC” at some points in my library, in order to not propagate the ‘const’ in the expression templates, and added “const_cast” statments.
It works for me, but it’s definitly a workarround for something that seems to be a bug of NVCC.

Someone could confirm?
A developper of NVCC could correct this issue?