template <typename T>
__global__ void MyFunc(const T& t)
{
 T t2(t); // Construct a new object of type T, as given by the prototype passed in.
}
struct Type1
{
 Type1(int mm) : m(mm) {}
 Type1(const Type1& rhs) : m(rhs.m) {}
 int m;
};
__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)
{
  Type1 t(50);
  MyFunc<Type1><<<dimGrid, dimBlock>>>(t);
}
Currently I have a few problems. First, I don’t know how to declare the variable ‘t’ in CallMyFunc so that it gets passed correctly to all the instances of the MyFunc function. I guess I need to somehow mark it as being shared memory but this doesn’t seem possible within a host function. So my first question is, how can I make sure that a user-defined type gets passed correctly to a kernel?
Second, is it even possible for CUDA to cope with templated parameters like this? If so, it’s going to make life a lot easier.
BTW, obviously in this trivial example I could just pass an int as a parameter. But that is not the point.
I’d really appreciate some help on this. I have simplified the first problem as follows.
struct Type1
{
Type1(int mm) : m(mm) {}
Type1(const Type1& rhs) : m(rhs.m) {}
int m;
};
__global__ void MyFunc(const Type1& t)
{
Type1 t2(t);
}
__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)
{
Type1 t(50);
MyFunc<<<dimGrid, dimBlock>>>(t); // Error:
// No conversion between type 'Type1' to 'const Type1 *'
}
So the first problem is that my const reference in the global function has been changed to a const pointer. Is this a documented feature that arguments can’t be passed by reference?
Then I changed my reference to a pointer which was straightforward:
Please tell me, how can I successfully pass initialization data to my kernel? I know it has to be in shared memory, but I don’t know how to share memory between the host thread and the units executing the kernel.
Your problem is indeed pointer related, but not C++ related. You’re passing HOST pointers to a global function which is expecting DEVICE pointers.
You can probably get it to work for classes like your examples by using the cudamemCopyToSymbol method to initialize them.
But be careful of creating classes with pointers inside of them on the host, because THOSE are host pointers too, which are just garbage when moved over to the device.
By the way, in device code, passing arguments by value is much more efficient than using const references (references are implemented as pointers, which often forces the compiler to use local memory instead of registers).