Working with 3D arrays: cudaMemcpy3D(), pointer problem? Can't access 3D copied 3d array with th

I’m working with 3-dimensional float array and I’m facing some problems. Even though I’m able to allocate memory and copy it, it looks like I’m not able to access the new space from the pitched pointer in the device code. The code is the following one:

float ***volume = (float ***) alloc3dMatrix(Nx, Ny, Nz, padX, padY, padZ, sizeof(float));

cudaPitchedPtr d_Input;

cudaExtent extent;

cudaMemcpy3DParms p = { 0 };

/* Populate matrix... */

/* CUDA memcpy: RAM -> VRAM */

extent = make_cudaExtent(Nx*sizeof(float)+2*padX, Ny+2*padY, Nz+2*padZ);

cudaMalloc3D(&d_Input, extent);

p.srcPtr = make_cudaPitchedPtr((void ****)&volume[0][0][0], Nx*sizeof(float)+2*padX, Ny+2*padY, Nz+2*padZ);

p.dstPtr = d_Input;

p.extent = extent;

p.kind = cudaMemcpyHostToDevice;

cudaMemcpy3D(&p);

/* Kernel call, threads and blocks are example values */

myKernel<<<4096, 128>>>((float ***)d_Input.ptr, Nx, Ny, Nz, ht, Nzinterval, iterations);

The problem is that once I’m inside the kernel function, I can’t access the d_Input.ptr’s values using the standard variable[y][z] way (segmentation fault), so I suspect I have a pointer incorrectly set. Could some one who is experienced with CUDA check if the function calls are correct for a three-dimensional array?

If relevant, I’m developing under Windows Vista x64 with Visual Studio 2008.

I will answer myself. The problem is that cudaMalloc3D() y cudaMemcpy3D() memory allocation results in a 1D indexed vector that has the data of the 3D volume. The difference with cudaMemcpy() is the data alignment, now being optimal for a 3D-way access. The new resulting vector has to be indexed with just 1 dimension.

You are probably wrongly making pitched ptr,
p.srcPtr = make_cudaPitchedPtr((void )&volume[0][0][0], Nxsizeof(float)+2padX, Ny+2padY, Nz+2padZ);

need something like this

p.srcPtr = make_cudaPitchedPtr((void **)&volume[0][0][0], (Nx+2padX)sizeof(float) Nx+2padX, Ny+2padY);