Kernel and template problem with CUDA 2.2 Problem with compilation when using C++ templates in kerne

After upgrading to CUDA 2.2, I can no longer compile the following code. It compiled without any problems using CUDA 2.1:



void global

some_kernel(T *dst, T *src) {

unsigned int idx = INDEX(pitch);

extern shared T neighbours;

neighbours[XCOORD] = src[idx];

/* Do sync and some calculations. */



The compilation results in the following errors where cData is just a typedef for float4.

error: declaration is incompatible with “cData neighbours” for line 6

error: a value of type “cData” cannot be assigned to an entity of type “float” for line 7

It seems that the compiler cannot handle that the kernel is being called with float and float4 as argument different places in the host code. As mentioned this compiles and works with no problems using CUDA 2.1. I’ve found elsewhere on the board that CUDA 2.2 has some known template problems, but they resulted in other errors. Does anyone know if I’m doing something wrong or if it is also a CUDA 2.2 problem?

Thank you in advance.