Suppose I created and initialized the device memory and now I want to pass it to the kernel as a parameter, but I want to use the offset, so the kernel receives pointer to the address inside the buffer instead of the address of the buffer itself. If I just call my kernel with “dev_ptr + offset” - will it work?
The similar problem is regarding memory copying from device to host and vice versa. Obviously, I can easily offset the host memory, can I do the same with device memory on CPU side?
Remember that CUDA is a subset of C++. So arrays, pointers, etc are handled just the way you are used to from regular C++.
For a question like this, simply giving it a try is probably much more convincing than theorizing about it.