Why is the parameter list for kernel launched "device-side" most restricted than a kernel

I can pass type’s from my favorite math library GLM to host-side kernel launches, but the second I try to pass them to a device-side kernel launch I get the error

Error	239	error : cannot pass an argument with a user-provided copy-constructor to a device-side kernel launch

Which suddenly necessitates all sorts of sloppy type conversions. Is there a more elegant way around this. Really simple reproducer code below.

__global__
void deviceSideKernelLaunch(glm::vec3 v)
{
}

__global__
void hostSideKernelLaunch(glm::vec3 v)
{
	deviceSideKernelLaunch<<< 1, 1 >>>(v);
}

extern "C"
void testLaunch()
{
	glm::vec3 v(1);
	hostSideKernelLaunch<<< 1, 1 >>>(v);
}

Your hostSideKernelLaunch is declared with the global attribute, meaning that its actually a device kernel that’s callable from the host. If you prefix with host instead, you should be fine. Or am I missing something?

I’d rather guess that the library misses the host device declaration on the copy constructor since it knows nothing about CUDA. I might be mistaken though as the error message suggests that this is generally not supported.

I lack experience here though as I usually optimize my kernels on the lowest level (CUDA is all about speed after all), so my kernels tend to not exploit any C++ features.

JorenH: If you are working on a GPU that supports dynamic parallelism, then you can call a global kernel from the device code just like you do on the host code. The NVIDIA developers tried to emulate the host runtime API as much as possible on the device, but it looks like this is one of the areas where the supported constructs in the two cases differ. Compute capability 3.5 really changes things…

I didn’t actually realize that one could pass objects with copy constructors by value to kernels from the host. :)