CUdeviceptr is defined as an unsigned int (32-bits) under Linux CUDA 1.1.
This is probably incorrect and should be changes to a 64-bit type. The reason is that passing a CUdeviceptr to cuParamSeti requires the offset to be computed. The most common way to do that is:
offset += sizeof(CUdeviceptr);
It would seem I have to hard code this on 64-bit platforms to:
offset += 8;
Now that I think of it __alignof() should pad out to 8-bytes, but most code examples don’t bother with __alignof and seem to work.
Can anyone confirm what the offset should be for CUdeviceptr when passing these to a CUfunction?
CUDA should define an __alignof() if a platform’s compiler doesn’t support it for the sake of portability.