Pointer arithemtic on cuda pointers on host code.


Some information if and how cuda pointer arithmetic is possible on host code in the manuals would be usefull/beneficial and maybe even crucial.

Can cuda pointer arithmetic be done on the host code ? And if so how ?


Yes, as types have the same size on the host and on the device, pointer arithmetic on host and device gives the same results. Just don’t dereference the device pointers on the host.

Are there any potential alignment issue’s ? For example when transferring memory between host and device for such a modified pointer, for example:



No alignment issues for a plain cudaMalloc(). Looks like in cudaMallocPitch() there is some padding however the function returns the pitch which can be used to compute the correct address in both host and device.

Basic types are automatically aligned for correct reading, so no problem there. The cudaMallocPitch() function helps you allocate a 2D array as a linear segment where each row is aligned to as to ensure coalesced reads in code that works row-by-row.

With structs, the Programming Guide (section recommends the use of the align() specifier to ensure correct alignment by the compiler. (Presumably, this also gets the pointer arithmetic right.)

Hello, since you mentioned align(), what would be the correct way to align a template struct? I ask because its size is not known in advance.


Ouch, that sounds like a C++ puzzle question. :) I’m not sure what the right way to handle that is. You could make the alignment a template parameter, I suppose.