Compilation error when using __syncthreads() in a __device__ template function

Hi all,
I wrote a device function to sort some input data and I had to use __syncthreads() to make it work properly. And it did work. Later I tried to change the implementation into template function. I need too make it the sorted type independent. Unfortunately after turning into template I get the compilation error as follows:

error: there are no arguments to ‘__syncthreads’ that depend on a template parameter, so a declaration of ‘__syncthreads’ must be available

What could be wrong? Is it a Cuda limitation??

Thank you for answers and suggestions.

Please show the corresponding code. __syncthreads can be used in template functions.

can you provide an example that we could compile and see the error?

I didn’t seem to have any trouble with a trivial test based on your description:

# cat t140.cu
template <typename T>
__device__ T mysort(T *d){
        __syncthreads();}

__global__ void k(int *x){
                int *d = NULL;
                int val = mysort(d);
                *x = val;}

int main(){

  int *x = NULL;
  k<<<1,1>>>(x);
}

# nvcc -o t140 t140.cu
#

Thank you all for your fast response. I did some tests with your simplified use case and I was able to reproduce the error. Which led me to suspicion that it is caused by my placing the template into header file as it is usual in c++. It doesn’t seem to be problem by itself but the compilation error may be relate to mixture of *.cu and *.cpp files in our project and wrong include of “cuda related headers” into *.cpp file. I’ll keep on investigating and let you know my findings…

Ondrej

I can confirm my suspicion in the previous post that the troubles I faced were caused by wrong include of headers, containing some cuda extensions such as device function specifier, into *.cpp files. The problem is solved.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.