Seaching a fast 3d Rotation

bin04017 · September 27, 2007, 3:17pm

Hi @all,
i am currently searching an implementation of a 3d rotation because mine is not working <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ /> . The size of the data will be around 512512512. Is there any implementation for that?
regards
bin

alex_dubinsky · September 27, 2007, 5:49pm

You mean the kind that powers 3D graphics? Where you multiply your coordinates by a 4x4 rotation matrix and use the new coordinates to sample a texture?

It should be extremely easy, although optimizing for good memory access would be much trickier.

Celebrandil · September 27, 2007, 8:05pm

Interesting problem! I would have looked at the possibility of factorizing the rotation into a series of shear and scaling operations. Making sure that memory operations are coalesced ought to be easier with shears than with rotations. The only problematic dimension is the first (x). One could imagine that a quick transpose (a mirroring along the x-y diagonal for example) could be used to solve that. By the way, I found this paper that might help you: http://citeseer.ist.psu.edu/301520.html.

asadafag · September 28, 2007, 3:14am

One thing: do not store the result in the same variable
There is a ptxas bug preventing from you performing matrix-vector multiplication and store it in the source vector.
See this post:
[url=“The Official NVIDIA Forums | NVIDIA”]The Official NVIDIA Forums | NVIDIA

However, seeing “512512512”, seems you’re doing a rotation on volume data. In that case, I don’t think that is a CUDA issue, you should look for an working algorithm first. Could try marching cubes + render to volume.

bin04017 · September 28, 2007, 7:35am

the problem is that i have to interpolate after the rotation . this interpolation takes the points from all 6 direct connected points ( left,right,up,down,front,back) . this is something “strange” memory shifting.

is there a possibility to start 512512512 threats (e.g. for each pixel) would be nice for the start to look if it is working with my idea right now.

regards

alex_dubinsky · September 28, 2007, 6:28pm

Trilinear filtering too? This problem will be dead-easy (and very fast) once 3D textures are introduced to CUDA (which I think won’t be at least till next year). In fact, you might consider ditching CUDA and just use directx. There’ll be zero memory issues (yay texture cache), very high performance, and complete hardware independence.

But for now, how about you post your current implementation so we can see why it doesn’t work?

bin04017 · October 1, 2007, 12:57pm

ok the idea is great BUT i need to do it on Fedora. and i have to do at least 360360180 rotations per calculation so … it should be really really fast.

alex_dubinsky · October 1, 2007, 4:53pm

opengl should expose 3d textures too.

Depending on the implementation of the texture cache (a good opportunity to compare nvidia vs ati here), this operation will be very, very fast. Ideally, it will be as fast as a memcpy.

bin04017 · October 2, 2007, 2:41pm

Hi,

some (for me) strang failure occurs if i want to set a float .

   memSize = size_x * size_y*size_z*sizeof(float);

    CUDA_SAFE_CALL(cudaMalloc((void**)&devMemImg, memSize));

    CUDA_SAFE_CALL(cudaMalloc((void**)&devMemRotImg, memSize));

......

CUDA_SAFE_CALL( cudaMemset(devMemRotImg, 0, memSize) );

bin04017 · October 2, 2007, 2:46pm

Hi,

some (for me) strang failure occurs if i want to set a float .

i’ll do something like

   memSize = size_x * size_y*size_z*sizeof(float);

    CUDA_SAFE_CALL(cudaMalloc((void**)&devMemImg, memSize));

    CUDA_SAFE_CALL(cudaMalloc((void**)&devMemRotImg, memSize));

......

    CUDA_SAFE_CALL( cudaMemset(devMemRotImg, 0, memSize) );

......

and then in the kernel i want to set (x is some index)

   devMemRotImg[X]=0;

and then the error occurs

line 193 is

CUT_CHECK_ERROR("Kernel execution failed");

what could this be because the whole procedure is working fine with emu=1. without this the procedure is running and doesn’t do any usefull.

regards

alex_dubinsky · October 2, 2007, 5:27pm

Well, if you’re still doing 512512512*sizeof(float) datasets, then obviously you can’t fit two of those into 768MB. The kernel will also fail if other resource limits are exceeded, like with too many threads in a block (not enough registers).

bin04017 · October 4, 2007, 8:08am

mmm ok i have now reduced my image to 323232 to avoid that. but the same error occurs. any other ideas. (code is available at http://www.codeproject.com/script/comments…353&msg=2256353 , it is only the try out code so its a liitle bit ugly but should work i think but it doesn’t)

alex_dubinsky · October 4, 2007, 4:36pm

What can I say. Reduce your program to as little as possible until it does work, and then build back up. Also add the option nvcc -keep so that you get the cubin which will tell you how many registers, shared mem, and constant mem your kernel is requiring. There’s also always the possibility of a compiler bug, but don’t bet your barn on it.

bin04017 · October 10, 2007, 7:03am

Now i have found that i could not write to the memory where my rotated image should be. is there any possibility to make sure that the allocated memory is not in a read only memory.?

MisterAnderson42 · October 10, 2007, 12:42pm

Uh, sure. Just allocate it with cudaMalloc(). Any thread in any block should be able to write to any memory location in the region allocated.

bin04017 · October 11, 2007, 7:26am

i do allocat the mem with cudaMalloc()

   CUDA_SAFE_CALL( cudaMemset(devMemRotImg, 0, memSize) );

    CUDA_SAFE_CALL( cudaMemset(devMemImg, 0, memSize) );

   CUDA_SAFE_CALL(cudaMemcpy(devMemImg, hostImg, memSize, cudaMemcpyHostToDevice));

but if i run the kernel there is the problem

devMemRotImg[(r_z_l*size_y+r_y_l)*size_x+r_x_l] = (num_t) InterpolatedValueS (devMemImg, size_x, size_y, size_z, i, j, k);

i have tried to do several things. i can read and write to devMemImg and read from devMemRotImg but i cann’t write to devMemRotImg

any othere idea.

regards

ps: link to a version is included in one former post.

MisterAnderson42 · October 11, 2007, 1:48pm

I haven’t tried to decipher your code, but are you absolutely sure that (r_z_l*size_y+r_y_l)*size_x+r_x_l never indexes outside of the array memory bounds? You could check this by running the kernel in emulation mode through valgrind on linux.

bin04017 · October 12, 2007, 8:49am

i am sure because in emulator mode it is running as it should. means no error or problems and the result is also as it should look like. the problem starts if i try to run in debug emulator mode or debug or release mode. there i get the error that something with the kernel is wrong. but i don’t know why because if it is working fine in emulator mode shouldn’t it work in all oder modes?

MisterAnderson42 · October 12, 2007, 1:51pm

“Working fine” doesn’t mean that you never write past the end of the array. You have to write outside your programs addressable space to get a seg fault.

Try running the emulation mode executable through valgrind as I suggested. It is a wonderful tool that can tell you whenever you write past the end of an array (even if it won’t produce a seg fault) and even what line of code is causing the problem.

Note: I’m only pushing this memory access check because I usually get “unspecified launch failure” when I accidentally write past the end of an array. I did just remember that I’ve gotten it before when my kernel requested a block size too big to run on the device too, so I looked at your code. You are calling the kernel like this:
rot3d<<>>(devMemImg, devMemRotImg, size_x, size_y, size_z, rotMatrix);
Why aren’t you passing the block and grid specifications to the kernel? I would guess that it is using uninitialized memory there and probably trying to run a thread block with 32897 threads per block and the device isn’t happy about that. How it works in emulation mode, I have no idea.

Topic		Replies	Views
3D Rotation time problem rotation 3d CUDA Programming and Performance	4	3355	November 8, 2007
Problems in a volume rotation CUDA Programming and Performance	8	8679	April 28, 2008
Efficiently rotate 3D volume CUDA Programming and Performance	4	9807	April 4, 2007
Interpolation of 3d rotation time problem CUDA Programming and Performance	1	1902	February 22, 2008
Rotation 3d speed up help CUDA Programming and Performance	0	1255	December 12, 2007
Is CUDA for me? Matlab alone is too slow. CUDA Programming and Performance	7	12650	July 9, 2007
Problem using cuda streams CUDA Programming and Performance	13	2122	March 4, 2015
Error handling in CUDA when using cudaMemcpy3D CUDA Programming and Performance	12	12023	December 22, 2010
Help with some CUDA concepts CUDA Programming and Performance	7	1544	August 16, 2009
3D Texturing CUDA Programming and Performance	7	6244	August 8, 2008

Seaching a fast 3d Rotation

Related topics