Determine threads per block as product of two variables.

I’m a programmer and would like to build and run the following climate-related cuda code.

http://www.mmm.ucar.edu/wrf/WG2/GPU/WSM5.htm

The WSM5 source wants to express threads per block as two variables (presumably multiplied together) XXX and YYY:

# settings for GTX 280

XXX = 32

YYY = 8

# settings for 5600 Quadro and GTX 8800

XXX = 8

YYY = 8

I have a Geforce GT 240M. DeviceQuery tells me the following:

DeviceQuery gives a single value of 512 threads per block.

How do I translate (decompose) the single DeviceQuery value into the 2 values (XXX and YYY) that the code wants?

Piecing together what I can from web, it would seem that threads go into warps and warps go into blocks.

DeviceQuery.exe tells me that in the case of my Geforce GT 240M that the warp size is 32. That is, there are 32 threads in a warp. If the total is 512 then that must mean that there 16 (32 * 16 = 512) warps in a 240M’s block.

Is that correct. So which is XXX and which YYY?

I’m guessing (based on a little further evidence I don’t show here) that XXX = 32 and YYY = 16.

An answer would be nice but I can also simply compile and run based on these values and see what I get.

Piecing together what I can from web, it would seem that threads go into warps and warps go into blocks.

DeviceQuery.exe tells me that in the case of my Geforce GT 240M that the warp size is 32. That is, there are 32 threads in a warp. If the total is 512 then that must mean that there 16 (32 * 16 = 512) warps in a 240M’s block.

Is that correct. So which is XXX and which YYY?

I’m guessing (based on a little further evidence I don’t show here) that XXX = 32 and YYY = 16.

An answer would be nice but I can also simply compile and run based on these values and see what I get.