3D matrix and 3D threads/blocks problem

Hi there,

I can setup and copy a 3D matrix to the device, and copy it back all successfully, but accessing it from a kernel seems to be a problem.
If the matrix is x by y by z then so long as x = z all works as it should using the single thread looping over the matrix as in the programming guide (p19)
And this also works with a 3D matrix of threads (actually a 2D matrix of blocks in the code I include here).

Anyway as soon as x != y it all goes wrong and I can’t figure out why.

Can anyone help please
3d_array.cu (2.47 KB)

for what i experienced, tab_cpu[y][z] is equivalent to tab_gpu[z][y]. I think it is because we forgot that in a 3d representation, x is depth, y is colum axis and z the row axis so row,col,depth=z,y,x

Ok, I’ve given up on this approach and now simply convert my 3D matrix to a vector and work from there, however I’m using a 2D grid of 1D blocks to give me the x,y,z of the element so this solution works well
Here is the code for anyone interested.

Cheers
my_3d_array.cu (1.29 KB)

Hello,

In your first file, it doesn’t work because you’ve written “p.srcPtr = make_cudaPitchedPtr((void*)array, xsizeof(float),x,y);" instead of "p.srcPtr = make_cudaPitchedPtr((void)array, z*sizeof(float),x,y);” Each time you tried to transfer data from device to host or host to device.

Otherwise, it works pretty well and it was a great help for me

Hello,

In your first file, it doesn’t work because you’ve written “p.srcPtr = make_cudaPitchedPtr((void*)array, xsizeof(float),x,y);" instead of "p.srcPtr = make_cudaPitchedPtr((void)array, z*sizeof(float),x,y);” Each time you tried to transfer data from device to host or host to device.

Otherwise, it works pretty well and it was a great help for me