pointer arithmetic is this allowed


the following gives funky results, so I was wondering if this is legal:

cudaMalloc((void**)&data, size*sizeof(float));

someKernel<<<grid, block>>>(data+someOffset, someMoreParams);

As this works fine from plain C, so I assume this is ok. As soon as templates and C++ join the show (basically an extension of the ‘reduction’ SDK example for NPOT arrays), I can cause reproducible freezes of X on Linux. So the question is: Can I do the above pointer arithmetic on the host side?

Thanks for insights,


Yes for sure. I did this very often. But I am not sure what cause the freeze or crash unless you post your code. I guess it’s probably reading/writing global memory which is not allocated or is not initialized.

one must be careful about offsets when operated with 2D arrays that have array “width” measured in bytes, so to calculate row offset you

  1. first convert the pointer into type char *
  2. offset by “width” bytes*row_number
  3. convert pointer into original type
  4. offset by column_number
  5. only then access

skipping step 1 or 3 usually causes unpredicted behavior when dereferencing pointers. SDK documentation points to this potential bug explicitly.

as to simple , non-strided arrays, there were no problems.
I only noticed one issue once, when I was trying to read 3 bytes as a single “long int” read (with non-aligned address). I was applying mask and shift then to break this long to the necessary triple. For some reason it just did not want to work the same as successive 3 byte reads, although I verified the approach on CPU and it worked well. If somebody else encountered such an issue, let me know