cudamemcpy3d issue

Hello!
I’ve seen people having problems with cudamemcpy3d, read some threads about their problems; however no one seems to actually know how this function works. I’m having a few questions :

If the returned cudaError is invalid argument, what might be the problem? I tried to narrow it down, and I came to a conclusion that either cudaPitchedPtr, or cudaExtent cause this error. Can anyone post how to properly use cudaMemcpy3d when copying from device to host memory (or visa-versa) , with a given offset (for example copying a quarter of a 3d cube into a whole cube, or in the terms of size - copying a n * (n/2) * (n/2) cuboid into a cube n * n * n. How to set the dst_offset/src_offset?
The other thing confusing me is the copyextent - is it the size of the memory we’re trying to copy (width in bytes, height, depth), or is it the size of the destination/source structure in whole ( to be clear, if I want to copy a quarter of the cube, is copy extent (n/2 * n/2 * n) or (n * n * n)?) I presume that only width should be given in bytes.
The last thing - when copying host->device, or device->host, we’re assigning a host array to a cudaPitchedPtr, which is a structure on the device. Can someone explain how’s this possible?

Thanks in advance

Hello!
I’ve seen people having problems with cudamemcpy3d, read some threads about their problems; however no one seems to actually know how this function works. I’m having a few questions :

If the returned cudaError is invalid argument, what might be the problem? I tried to narrow it down, and I came to a conclusion that either cudaPitchedPtr, or cudaExtent cause this error. Can anyone post how to properly use cudaMemcpy3d when copying from device to host memory (or visa-versa) , with a given offset (for example copying a quarter of a 3d cube into a whole cube, or in the terms of size - copying a n * (n/2) * (n/2) cuboid into a cube n * n * n. How to set the dst_offset/src_offset?
The other thing confusing me is the copyextent - is it the size of the memory we’re trying to copy (width in bytes, height, depth), or is it the size of the destination/source structure in whole ( to be clear, if I want to copy a quarter of the cube, is copy extent (n/2 * n/2 * n) or (n * n * n)?) I presume that only width should be given in bytes.
The last thing - when copying host->device, or device->host, we’re assigning a host array to a cudaPitchedPtr, which is a structure on the device. Can someone explain how’s this possible?

Thanks in advance