Suppose the following. I allocate a buffer in the GPU from the host. Then I have a pointer to that memory in the GPU. I transfer an array with several data from the host to that GPU buffer. Then I call a kernel. Can I make several pointer operations in the host with GPU pointer? I explain better. Can I do the following?
The fundamental rule is that you cannot dereference a device pointer in (ordinary) host code. Operations that don’t involve dereferencing device pointers should be OK in host code.
I was supposing that in someway the memory management in GPU was different from the CPU. When I add “+1” to a double pointer in CPU the code adds 8 (size of double). But I supposed that in the GPU could add another amount if the memory was organized another way.
Just a reminder; you need to be aware of the alignment and stride/pitch values, especially with cudaMallocPitch(), cudaMalloc3D(), etc.
I realized considerable speedup in some cases once I started using cudaMalloc() once for the whole memory requirement and calculated device pointers myself for individual chunks of data instead of calling cudaMalloc() for each chunk.
The amount added to a pointer via pointer arithmetic is a matter of compliance with the C/C++ language. It does not matter how memory is organized, or even if the pointer is valid. A double pointer will add the implementation-dependent size of a double scaled by the offset. As you point out, for all implementations on which CUDA is currently valid, the size of a double is 8 (bytes).
Furthermore, the host compiler, which is generating the underlying code to offset the pointer, has no knowledge of the difference between a host and device pointer. They will be handled identically.
How I declare (or allocate) “my_other_GPUpointer” in the host without allocating memory in the GPU? It is only to store values like “GPUpointer+offset” in the host.
This is pretty much the same way you would do it in C if the cudaMalloc statement were replaced with an equivalent malloc statement. There’s no need to allocate anything if my_offset is less than my_size (i.e. if you intend my_other_GPUpointer to point to some region that has already been allocated for myGPUpointer)