Scalability and performance


Suppose I have a Non-Graphical application (say FFT of 10000 data points) that I wrote on 8800GT. I got X times speed-up compared to optimized CPU code.

In which of the following cases I will get speed up more than X times:

1- Running the same code (without any modifications) on a GPU with more cores than in 8800GT, such as GTX200 series Tesla

2- Running the same code (without any modifications) on two GPUs in SLI mode (two 128 core GPUs).

And by the way what will happen in case my application were Graphical, rendering etc?

I just need a short answer to verify my understanding.

Thanks in advance for your time!

Probably option 1, but only because option 2 is impossible. CUDA doesn’t work with SLI and it isn’t a magical path to multi-gpu. If you want multi-gpu, you have to explicitly program it.

Neither of those is necessarily true. Your app will scale if you programmed it appropriately (isn’t PCIe limited, will fill the new machine appropriately, etc.). If you have a kernel that takes M amount of time on an N SM part, you have W blocks where W >> N, and your performance is bottlenecked on actual computation (or DRAM bandwidth), yes, it will scale on a larger part with more than N SMs and more memory bandwidth to run for some time less than M.

Thankfully, fulfilling these obligations in CUDA is generally really easy to do.