I’m working with 3-dimensional float array and I’m facing some problems. Even though I’m able to allocate memory and copy it, it looks like I’m not able to access the new space from the pitched pointer in the device code. The code is the following one:
float ***volume = (float ***) alloc3dMatrix(Nx, Ny, Nz, padX, padY, padZ, sizeof(float));
cudaPitchedPtr d_Input;
cudaExtent extent;
cudaMemcpy3DParms p = { 0 };
/* Populate matrix... */
/* CUDA memcpy: RAM -> VRAM */
extent = make_cudaExtent(Nx*sizeof(float)+2*padX, Ny+2*padY, Nz+2*padZ);
cudaMalloc3D(&d_Input, extent);
p.srcPtr = make_cudaPitchedPtr((void ****)&volume[0][0][0], Nx*sizeof(float)+2*padX, Ny+2*padY, Nz+2*padZ);
p.dstPtr = d_Input;
p.extent = extent;
p.kind = cudaMemcpyHostToDevice;
cudaMemcpy3D(&p);
/* Kernel call, threads and blocks are example values */
myKernel<<<4096, 128>>>((float ***)d_Input.ptr, Nx, Ny, Nz, ht, Nzinterval, iterations);
The problem is that once I’m inside the kernel function, I can’t access the d_Input.ptr’s values using the standard variable[y][z] way (segmentation fault), so I suspect I have a pointer incorrectly set. Could some one who is experienced with CUDA check if the function calls are correct for a three-dimensional array?
If relevant, I’m developing under Windows Vista x64 with Visual Studio 2008.