Templated arguments / shared memory

How do I do the equivalent of the following:

template <typename T>

__global__ void MyFunc(const T& t)

{

 Â  T t2(t); // Construct a new object of type T, as given by the prototype passed in.

}

struct Type1

{

 Â  Type1(int mm) : m(mm) {}

 Â  Type1(const Type1& rhs) : m(rhs.m) {}

 Â  int m;

};

__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)

{

 Â  Â Type1 t(50);

 Â  Â MyFunc<Type1><<<dimGrid, dimBlock>>>(t);

}

Currently I have a few problems. First, I don’t know how to declare the variable ‘t’ in CallMyFunc so that it gets passed correctly to all the instances of the MyFunc function. I guess I need to somehow mark it as being shared memory but this doesn’t seem possible within a host function. So my first question is, how can I make sure that a user-defined type gets passed correctly to a kernel?

Second, is it even possible for CUDA to cope with templated parameters like this? If so, it’s going to make life a lot easier.

BTW, obviously in this trivial example I could just pass an int as a parameter. But that is not the point.

Many thanks.

I’d really appreciate some help on this. I have simplified the first problem as follows.

struct Type1

{

   Type1(int mm) : m(mm) {}

   Type1(const Type1& rhs) : m(rhs.m) {}

   int m;

};

__global__ void MyFunc(const Type1& t)

{

   Type1 t2(t);

}

__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)

{

   Type1 t(50);

   MyFunc<<<dimGrid, dimBlock>>>(t); // Error:

                   // No conversion between type 'Type1' to 'const Type1 *'

}

So the first problem is that my const reference in the global function has been changed to a const pointer. Is this a documented feature that arguments can’t be passed by reference?

Then I changed my reference to a pointer which was straightforward:

struct Type1

{

   Type1(int mm) : m(mm) {}

   Type1(const Type1& rhs) : m(rhs.m) {}

   int m;

};

__global__ void MyFunc(const Type1* t)

{

   Type1 t2(*t); // Bug:

                       // t2.m != 50

}

__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)

{

   Type1 t(50);

   MyFunc<<<dimGrid, dimBlock>>>(&t);

}

Now I get no compilation errors, but the value of 50 is not propagated to the kernel executions. Simplifying further:

__global__ void MyFunc(const float* t)

{

   // Bug:

   // *t != 50.0f

}

__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)

{

   float t(50.0f);

   MyFunc<<<dimGrid, dimBlock>>>(&t);

}

Please tell me, how can I successfully pass initialization data to my kernel? I know it has to be in shared memory, but I don’t know how to share memory between the host thread and the units executing the kernel.

Many thanks.

C++ is not fully supported in kernels yet (it is coming soon according to the roadmap shown at NVISION)

Your problem is indeed pointer related, but not C++ related. You’re passing HOST pointers to a global function which is expecting DEVICE pointers.

You can probably get it to work for classes like your examples by using the cudamemCopyToSymbol method to initialize them.

But be careful of creating classes with pointers inside of them on the host, because THOSE are host pointers too, which are just garbage when moved over to the device.

Yes, but I don’t know how to declare and initialize device or shared variables from a host function. The following does not compile:

__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)

{

   __device__ float t(0.5f); // Error!

   MyFunc<<<dimGrid, dimBlock>>>(&t);

}

Can you share a snippet which shows how to do this please, or point me to the relevant docs?

Thanks very much.

you have to allocate memory for device variables on the host.
you can copy memory from host to device to initialize those variables.

Check out cudamalloc and cudamemcopy in the programming guide

Why not just passing arguments by value?

struct Type1

{

  Type1(int mm) : m(mm) {}

  Type1(const Type1& rhs) : m(rhs.m) {}

  int m;

};

__global__ void MyFunc(Type1 t)

{

  Type1 t2(t);

}

__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)

{

  Type1 t(50);

  MyFunc<<<dimGrid, dimBlock>>>(t);

}

By the way, in device code, passing arguments by value is much more efficient than using const references (references are implemented as pointers, which often forces the compiler to use local memory instead of registers).

That code does not compile for me:

[i]error C2664: ‘_device_stub___globfunc__Z6MyFuncI5Type1EvT’ : cannot convert parameter 1 from ‘Type1’ to ‘Type1 *’

No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called[/i]

The following doesn’t work either. I’ve been right through the programming guide and I’m tearing my hair out.

struct Type1

{

 Type1(int mm) : m(mm) {}

 Type1(const Type1& rhs) : m(rhs.m) {}

 int m;

};

__global__ void MyFunc(const Type1* t)

{

 Type1 t2(*t); // Bug: t->m != 50

}

__host__ void CallMyFunc(dim3 dimGrid, dim3 dimBlock)

{

 Type1 tHost(50), *tDevice;

 cudaMalloc((void**)&tDevice, sizeof(Type1));

 cudaMemcpy(tDevice, &tHost, sizeof(Type1), cudaMemcpyHostToDevice);

 MyFunc<<<dimGrid, dimBlock>>>(tDevice);

}

This must be so simple. Please, someone show me how to pass a typed object to a kernel.

EDIT: The code above does work after all! Sanity is restored!