I can setup and copy a 3D matrix to the device, and copy it back all successfully, but accessing it from a kernel seems to be a problem.
If the matrix is x by y by z then so long as x = z all works as it should using the single thread looping over the matrix as in the programming guide (p19)
And this also works with a 3D matrix of threads (actually a 2D matrix of blocks in the code I include here).
Anyway as soon as x != y it all goes wrong and I can’t figure out why.
for what i experienced, tab_cpu[y][z] is equivalent to tab_gpu[z][y]. I think it is because we forgot that in a 3d representation, x is depth, y is colum axis and z the row axis so row,col,depth=z,y,x
Ok, I’ve given up on this approach and now simply convert my 3D matrix to a vector and work from there, however I’m using a 2D grid of 1D blocks to give me the x,y,z of the element so this solution works well
Here is the code for anyone interested.
In your first file, it doesn’t work because you’ve written “p.srcPtr = make_cudaPitchedPtr((void*)array, xsizeof(float),x,y);" instead of "p.srcPtr = make_cudaPitchedPtr((void)array, z*sizeof(float),x,y);” Each time you tried to transfer data from device to host or host to device.
Otherwise, it works pretty well and it was a great help for me
In your first file, it doesn’t work because you’ve written “p.srcPtr = make_cudaPitchedPtr((void*)array, xsizeof(float),x,y);" instead of "p.srcPtr = make_cudaPitchedPtr((void)array, z*sizeof(float),x,y);” Each time you tried to transfer data from device to host or host to device.
Otherwise, it works pretty well and it was a great help for me