Problems in a volume rotation

Dspazian · April 21, 2008, 3:48pm

Hi to everyone,
I’m trying to implement a 3D rotation of a volume considering the rotation as a sum of 2d rotation along each direction. I think it’s simpler than considering a custom 3 rotation, also because I have 2d bilinear interpolation for free External Image.
The idea is to load the volume entirely into GPU memory , and use an algorithm that is very similar to the one found into SDK (it’s called simple texture), loading each slice one by one in a texture and obtain the new image interpolated directly by texture fetching from the old one. I’ve already overcome to some initial problems, like the conversion from unsigned short int format of the volume to float (necessary for bilinear interpolation), but I can’t implement the rotation. What I obtain is an image that seems to be rotated, but It’s full of NaN.I think that it could be derived from the choice of block and grid dimensions, I also tried to pad the volume to a dimension that is power of 2, but nothing better happens…
Anyone could have some kind of an Idea?
I’ll really appreciate it.
Thanks for you help,

Davide

seibert · April 21, 2008, 4:32pm

Just as an aside, in the CUDA 2.0 beta, you now get 3D interpolation for free as well. :)

Dspazian · April 22, 2008, 7:09am

Good to know it, it could be a future, simpler implementation of my algorithm ;) . But I’m wondering why I’m getting so many NaN into my image…could it be derived to a bad choice of block dimensions?Could a bad coalescent reading give read-error?

Simon_Green · April 22, 2008, 10:10am

It’s hard to say without seeing your code. Does it run correctly in emulation mode?

Memory coalescing doesn’t affect correctness, just performance.

BTW, it’s actually not necessary to use float format to get interpolation - you just need to use “cudaReadModeNormalizedFloat” as the read mode.

Dspazian · April 22, 2008, 12:21pm

I’ve found where my problem was. I was specifying a bad amount of bytes into the function cudaMemcpyToArray, so it was crashing.

For what concern the interpolation, tell me If I’ve understand right:

-I have to load my volume as unsigned short int

-I have to declare my texture as texture<unsigned short int, 2, cudaReadModeNormalizedFloat> tex;

-I perform the rotation , obtaining int again.

using just 2 bytes for each pixel, I should be able to play with volume quite large (even 700^3) B)

I’m also curious about the new texture3d type introduced with cuda 2.0…performing a full trilinear interpolation should be really faster than a composition of 3 bilinear itnerpolation… :D

nwilt · April 22, 2008, 4:47pm

By all means, give 3D textures a try, they work on the entire installed base of CUDA-capable hardware.

To enable interpolation, you must change the filtering mode associated with the texture. Note, the interpolation is lower precision than what your kernel could compute manually.

Also, 3D textures have different max dimensions (2048^3 as opposed to 64K x 32K).

DenisR · April 22, 2008, 7:11pm

Could it be possible in the future to have a 3D texture that is 16x16x(more than 2048) ?
I have 2 dimensions that stay quite low, but a third dimension that could become more than 2048 (for now 2048 will do I think (will have to check), but that might soon not be enough anymore)
Or are there hardware limitations that prevent this? (I can imagine, given that this will probably not occor so quickly with graphics)

Dspazian · April 23, 2008, 7:05am

Unfortunately It’s impossibile to install CUDA 2.0 beta together with 1.1. This is bad, because I cannot simply test the new features of CUDA 2 just simply linking to a libreary instead of another…Why have you chosen to not permit coexistence?

Why the precision of trilinear interpolation is worse than the bilinear one?I would have aspect that and ipothetical tex3D would be quite precise as tex2D…

My idea is that if you consider the volume as a stack of texture, you could perform the rotation with only 1 volume in memory and some image buffer for the rotation. On the other hand, if you want to implement a full 3D rotation with trilinear interpolation, you will need 2 different volumes (because the rotation cannot be performed in-place). So considering a “human” equipment (8800 GTX with 768 MB of RAM) you can store 2 volumes up to 700^3 elements with int precision.

Any way, I have a quick question to ask: I’m performing the rotation on the other axe (the y) and for all the slices I need to build up a texture taking from several parts of the volume, so I decided to write a kernel for it that looks more or less like this:

global void reshuffle(float *VolIn, float *SliceOUT, int width, int height)

{

    const int threadID = blockIdx.x * blockDim.x + threadIdx.x;

const int numThreads = blockDim.x * gridDim.x;

float Src;

for (int index = threadID; index < width*height; index +=numThreads)

{

	SliceToBeRotated[index]=Src;

	Src=vol[index*width*height];

}

//if(threadID<width*height)

//{	

//	Src=vol[threadID*width*height];

//	SliceToBeRotated[threadID]=Src;}

}

The problem is that during the execution of my program I receive “invalid device pointer”

What I did wrong?And moreover which one of the two version is quicker? I suppose the second one (the commented one)…

The coalescent reading is still a bad beast for me…

Again, thanks for you help guys…

Dspazian · April 28, 2008, 8:42am

Finally I’ve got the algorithm working. It performs the rotation of a volume 250^3 along the three axis in ~300 ms…that’s not bad External Media

The real bottleneck is reading the volume (that is stored in unsigned short) into float and back. So, I’m trying to work directly with unsigned short. If I’ve understood well what I have to specify are these lines:

the texture must be declared like this:

    <b> texture<sInt, 2, CudaReadModeNormalizedFloat> tex;</b>

the array must be declared like this:

    [b]cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(16, 0, 0, 0, cudaChannelFormatKindUnsigned);

        cudaArray* cu_array;[/b] 

    <b>GPU(cudaMallocArray(&cu_array, &channelDesc, XDIM, YDIM));</b>

and the kernel that obtain interpolated data has to look like this:[b]

global void transformKernel(sInt *g_odata, int width, int height, float theta)

{

// calculate normalized texture coordinates

unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;

unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

float u = x / (float) width;

float v = y / (float) height;

// transform coordinates

u -= 0.5f;

v -= 0.5f;

float tu = ucosf(theta) - vsinf(theta) + 0.5f;

float tv = v*cosf(theta) + u*sinf(theta) + 0.5f;

// read from texture and write to global memory

g_odata[y*width + x] =tex2D(tex, tu, tv);

}[/b]

With this setup I read back a volume that is completely empty. On the other hand, If I specify a channeldescriptor like this:

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc( 32 , 0, 0, 0, cudaChannelFormatKindUnsigned);

I get back the initial volume without any changes…it seems that I don’t perform any interpolation…

Any ideas?

Thanks for you help,

Davide

Topic		Replies	Views
Efficiently rotate 3D volume CUDA Programming and Performance	4	9792	April 4, 2007
Interpolation of 3d rotation time problem CUDA Programming and Performance	1	1900	February 22, 2008
texture interpolation Tesla precision issue? CUDA Programming and Performance	3	1878	June 27, 2011
Seaching a fast 3d Rotation CUDA Programming and Performance	18	7449	October 12, 2007
3D Rotation time problem rotation 3d CUDA Programming and Performance	4	3349	November 8, 2007
I am trying to interpolate a 3d array, via Cuda texture memory. But getting weird values at its original coordinates. CUDA Programming and Performance	1	1530	December 13, 2018
Repeated 1D interpolation with type promotion CUDA Programming and Performance	3	628	October 12, 2021
Using tex2D for unsigned short/char CUDA Programming and Performance	14	3839	November 15, 2017
Volume Render : demo in cudasdk2.0 bata, linux CUDA Programming and Performance	16	6746	October 10, 2010
Linear interpolation with integer texture. CUDA Programming and Performance	6	2852	August 12, 2022

Problems in a volume rotation

Related topics