Template function pointer

Hi all,

Passing a function pointer to a CUDA kernel is described in detailed in the SDK project FunctionPointers.

In that example, the function accepts a few unsigned char and float parameters and returns an unsigned char.

My kernel is a TEMPLATE kernel and I would like to pass a TEMPLATE function to it.

To this end, I have tried the following:

  1. Define a template typedef of the type of function I would like to pass as parameter (typedef cannot be templated as-is, it must be wrapped in a struct/class).
template<class T>

struct Operators

{

    typedef T (*Operator)(T);

};
  1. Define a device template function ‘d_Inverse’ of type device Operators::Operator
__device__ Operators<T>::Operator d_Inverse = Inverse;

where Inverse is a host function with a number of overloaded implementations, e.g.,

int Inverse(int i){return 1/i;}

float Inverse(float f){return 1/f;}
  1. Before the kernel launch, called within a templated “extern” function, I copy the address of ‘d_Inverse’ to a host variable ‘op’.
Operators<T>::Operator op = NULL;

cudaError_t error = cudaMemcpyFromSymbol((void*)&op, d_Inverse, (size_t)sizeof(Operators<T>::Operator));
  1. I call my kernel with op as parameter of type
Operators<T>::Operator.

This code mimics the strategy adopted for non-templated functions. Unfortunately it fails to compile. In particular, the call

Operators<T>::Operator op = NULL;

returns a compilation error, whereas the instantiated version of it

Operators<float>::Operator op = NULL;

does not.

Help is greatly appreciated!

Thanks,

Olivier

A bit more context and the exact wording of the compiler errors could be helpful.
Nevertheless I suspect it comes down to a simple problem: There are no template function pointers, since there are no template functions per se, only templates for functions. This means you can only take the pointer of an instantiated template function like the mentioned Operators::Operator.

Thanks for your answer. Indeed, the code I posted works fine if I compile it for a particular instantiation of the template T. In this case, however, the “extern” function that calls the GPU kernel must be instantiated for all the template types I would like to implement. I wanted to avoid this code redundancy…