3D matrix transpose

Hi all,

I am trying to transpose a 3D data set using CUDA. The CUDA code samples show how to transpose a 2D matrix, but if anyone can give any tips on how to expand that program or to use it so I can transpose a 3D matrix along a given dimension (x, y or z).

Basically this is part of a Poisson solver that I am designing as part of my graduate research.

Thanks.

Matrix_transpose_post.pdf may be what you want.

This is extremely useful. Thank you so much. I’ve been trying to figure out why you are calculating all the coarse grid and the k1, k2 numbers. Any help on that front would be really appreciated.

suppose transpose operation is (x,y,z) --> (y, z, x)

you need to do transpose slice by slice (x-z slice) along direction y and each slice can be done by 2-D grid.

however CUDA only supports 2-D grid (CUDA 4.0 can use 3-D grid), we need to compact all slices into 2-D grid.

That is why I compute k1 and k2. You can choose proper k1 and k2 such that unused blocks are minimal.

Even CUDA 4.0 can support 3-D grid, you may need the same technique because n2 may be larger than 65535 but n1, n3 are small.

I’m having some trouble running this function. Can you please post a sample test file that uses this function correctly? It doesn’t have to be overly complicated, just a simple main function with the necessary library includes and a correct call to the transpose kernel.

Thanking you in anticipation.

try attached file. The matrix is row-major.
transpose3D.tar.gz (4.45 KB)