cuda_sizeof() host-side calculation of device sizeof

This is a common situation in more advanced kernels. You need to allocate device memory on the host, but you have a data structure on the device that may be using pointers or variously-sized variables and you need a sizeof() that can tell you how much memory to actually allocate. A host-side sizeof() will lie to you, in case your machine’s pointers are 64bit or the compiler’s packing rules are different.

I just traced down one of those non-deterministic array overflow bugs that (I hope) was caused by this, and I’d like a solution besides random padding.

Even I had the same issue.

sizeof(double) in host is 64-bits…

However,
CUDA treates “double” as “floats” for certain architectures (that dont have double-precision). This causes confusions when we allocate array of doubles and so on…

Any solutions?

Very similar question, may be somebody know how to escape it.

Suppose 64bit CPU (I have 64Gb main memory, so I really need 64bit pointers).

I have a structure like:

typedef struct

{ int *A, *B;

   size_t TexA, TexB;

   float F1, F2, ...;

} sParam;

that is supposed to send to the kernel as the argument. It is easy way and does not need an additional work around.

Until the sizeof(sParam) is less than 256 bytes, with 64bit pointers, I do not care, but if it is a little bit larger, I really want to convert my CUDA pointers and texture indexes into 32 bit.

It is definitely possible to make something like:

typedef struct

{ int PointerA, PointerB;

   int PointerTexA, PointerTexB;

   float F1, F2, ...;

} sParam;

and later convert it each time to pointers, but it is not elegant…

Can somebody suggest to do it more clever way?

Thank you!

Sincerely

Elena