3D cuda examples 3d imaging

I have some questions as following:

  1. dim3 block(8, 8, 1); or dim3 block(16, 16, 1); is always able to be seen anywhere, my questions is if I want to compute a volume (512512*512), how I declare block perfectly, is dim3 block(16,16,16) ok?

  2. float4 is always used as type in cuda. But my cpp program is always using float type pointer. How about using float in cuda? What is float4 exactly mean, if I just use float* ?

  3. What does cutilSafeCall() last parameter “pbo” mean?

Thanks a lot.


No, the total amount of threads in a block can never exceed 512. And even then, it depends on the register usage of your kernel. 8192 (or 16384 for GT200 hardware) divided by the amount of registers used by your kernel gives the maximum amount of threads possible

There are plenty of examples in the SDK that use float, int and other types. float4 is a structure with 4 float members : x,y,z,w

NVIDIA advises to not use cutil, but do your own error-checking. cutil is a convenience thing for the SDK examples and can change without notice (I think it already did a few times)

pbo is usually an opengl interop buffer object if I am not mistaken.

Thank you denis.

how different about

dims block(32,16,1)
dims block(16,16,2)
dims block(8,8,8)

is that all the same efficient in parallel computing?


To tell you the truth: you will have to benchmark. In the past people have for example found that NxM grids are faster than MxN (don’t remember if N>M or M>N). The only thing that is certain is that a blocksize that is a multiple of 32 is smart.

Thank you for your advice.

But from the view of 3D image analysis, whether is block(8,8,8) better?

If your block is working with cuboids then 8x8x8 might indeed work good. But it all depends on your application and nature of computation… Experiment and find it out.