Getting the class size of non POD types for the device

I am trying to get the size of C++ classes with virtual methods so that I can use cudaMalloc to allocate space for the object on a device and call a placement new on the device. The question I have is how do I get the correct size of the object in bytes to pass to cudaMalloc?

The standard answer is to simply use sizeof(classname) and that will give you the correct size on both the host and device, and this is what we usually do. However, after encountering some errors on Windows, I came across an exception to this rule in the “Windows-Specific” section of the Classes documentation of the CUDA Programming Guide ([url]Programming Guide :: CUDA Toolkit Documentation). According to this documentation, “The CUDA compiler follows the IA64 ABI for class layout, while the Microsoft host compiler does not.” Most importantly, “The CUDA compiler may compute the class layout and size differently than the Microsoft host compiler for [certain types].” I refer back to the guide on exactly which types fall into this exception, but classes with virtual methods and multiple inheritance qualify.

So, for cases where it is possible for the size of a class to differ on the host and device, how does the host code get the size of the device version of the class?

You could launch a kernel that does sizeof(class) in kernel, and returns that data to the host.