Issues with cuda Memcpy3D and pitched memory

Hello, I’m new to CUDA programming and i’m using it to create a pitched memory 3D arrangement (I can’t use cuda Arrays since I need double2 type data and arrays are restricted to float as I read).

Here’s the code i use:

cudaError_t status = cudaSuccess;

size_t bytesProcessed;

cudaExtent CKS_dim = {0};
CKS_dim.width = width * sizeof(double2);
CKS_dim.height = height;
CKS_dim.depth = iterations;

printf("\nIniciando extents...\n");

cudaPitchedPtr d_ComplexKernelS = {0};
status = cudaMalloc3D (&d_ComplexKernelS, CKS_dim);

if(status != cudaSuccess){fprintf(stderr, "%s\n", cudaGetErrorString(status));}

cudaPos CKSPos = {0};
CKSPos.x = 0 * sizeof(double2);
CKSPos.y = 0 * sizeof(double2);
CKSPos.z = 0 * sizeof(double2);
printf("\nIniciando posiciones...\n");

// allocate memory to copy initialized data from device

double2* buffer = (double2*) malloc (width*height*iterations*sizeof(double2));

printf("\nbuffer generado...\n");

cudaPitchedPtr hostBufferPitch3D = {0};
hostBufferPitch3D.ptr = (void*)buffer;
hostBufferPitch3D.pitch = CKS_dim.width*sizeof(double2); /* memory extend per line x direction in bytes*/
hostBufferPitch3D.xsize = CKS_dim.width; /* extend of data in x direction*/
hostBufferPitch3D.ysize = CKS_dim.height; /* extend of data in y direction*/

printf("\nIniciando copia de parametros...\n");

// cudaMemcpy3D Device to Host

cudaMemcpy3DParms CKSCopyParms = {0};
CKSCopyParms.srcPos = CKSPos;
CKSCopyParms.srcPtr = d_ComplexKernelS;
CKSCopyParms.dstPos = CKSPos;
CKSCopyParms.dstPtr = hostBufferPitch3D;
CKSCopyParms.extent = CKS_dim;
CKSCopyParms.kind = cudaMemcpyDeviceToHost;

status = cudaMemcpy3D(&CKSCopyParms);

if(status != cudaSuccess){fprintf(stderr, "%s\n", cudaGetErrorString(status));}

The thing is that once I’ve created my device 3D data arrangement cuda cudaMalloc3D for one side, my pitched pointer to host buffer memory, cudaPos parameters and cudaMemcpy3DParms being set…my program just stops when cudaMemcpy3D(&CKSCopyParms) is issued…when taking this only instruction out of the way, all the rest goes smoothly, apparently…

I’ve been looking for an example of 3D memory arrangements in cuda that do not involve cudaArrays with no luck, could anyone point me in the right direction?

Thanks in advance for any help!