How two start defining the boundaries Grid,Block and warp sizes

Hi,

I have just started working on my thesis, which includes adjusting an existing cfd code to run on a gpu.
Therefore I got new hardware, NVidia GeForce GTX 470.

Can someone please give me a few hints, where I can find specs of the gpu which include the number of SMs etc, I just found the number of Cuda cores in the specs pdf from nvidia.

What is the best way of starting, I mean how to define what grid block and warp sizes should I use? And is it difficult to change the cuda code for other hardware types, I will get a tesla card in about 6 month time.

sorry for the maybe ridiculous question, but I am completely new to cuda and I don t want to mess it up right from the beginning.
thanks in advance

Markus

Hi,

I have just started working on my thesis, which includes adjusting an existing cfd code to run on a gpu.
Therefore I got new hardware, NVidia GeForce GTX 470.

Can someone please give me a few hints, where I can find specs of the gpu which include the number of SMs etc, I just found the number of Cuda cores in the specs pdf from nvidia.

What is the best way of starting, I mean how to define what grid block and warp sizes should I use? And is it difficult to change the cuda code for other hardware types, I will get a tesla card in about 6 month time.

sorry for the maybe ridiculous question, but I am completely new to cuda and I don t want to mess it up right from the beginning.
thanks in advance

Markus

I found this equation

number of blocks = (TOTAL_ELEMENTS/NUMBER_OF_THREADS)

But how do I define the number of threads and total elements?

I found this equation

number of blocks = (TOTAL_ELEMENTS/NUMBER_OF_THREADS)

But how do I define the number of threads and total elements?

Hi there,

I found out that the GTX 470 has 448 cuda cores, which means 14 SMs.

would be great if someone could help me with the rest of the numbers (block size, max number of threads, max number of warps)

cheers

Hi there,

I found out that the GTX 470 has 448 cuda cores, which means 14 SMs.

would be great if someone could help me with the rest of the numbers (block size, max number of threads, max number of warps)

cheers

You should start from the Programming Guide. Appendix G has the numbers you are seeking, and Chapter 2 discusses the hierarchy of kernels, blocks, warps, threads & co.
It’s definitely recommended to read the complete Programming Guide before starting programming with CUDA.

You should start from the Programming Guide. Appendix G has the numbers you are seeking, and Chapter 2 discusses the hierarchy of kernels, blocks, warps, threads & co.
It’s definitely recommended to read the complete Programming Guide before starting programming with CUDA.

cheers that is what I was looking for.

Can you answer my second question as well, if it is a lot of work to adjust an existing cuda program to a better hardware (from gtx 470 to a tesla card)?

cheers that is what I was looking for.

Can you answer my second question as well, if it is a lot of work to adjust an existing cuda program to a better hardware (from gtx 470 to a tesla card)?

No adjustment should be needed at all, although some fine tuning might improve performance.

In the case of the Tesla cards, it might either mean that you may put their additional memory to good use reducing the transfers over PCIe to host memory, or that additional tuning may be necessary to fully exploit their increased double precision throughput.

No adjustment should be needed at all, although some fine tuning might improve performance.

In the case of the Tesla cards, it might either mean that you may put their additional memory to good use reducing the transfers over PCIe to host memory, or that additional tuning may be necessary to fully exploit their increased double precision throughput.