Would iterating a pointer which points to memory that has been allocated on the device using cudaMalloc() work on the host side the same as it would in a kernel?
e.g. if I execute on the host:
float * ptr;
cudaMalloc((void**)&ptr, 10 * sizeof(float));
float * ptr2 = ptr + 5;
Can I now pass ptr2 to a kernel to get the desired behavior starting on the 5th element of the array? Or is it necessary to do the + 5 in the kernel?