Tuning for GTX 280

huzza · August 8, 2008, 9:26am

Hello,

I have just moved up from a 8800 GTX to a GTX 280, and besides adjustment of the execution configuration I was wondering what other parameters I should be tuning to take advantage of the new hardware? My application is fairly register intensive (unavoidably), and also requires transfer (host->device) of a certain amount of data between executions. I was therefore hoping that the new hardware would do wonders because of its higher register count and memory bandwidth, but in fact I see only about 20% improvement - hardly stunning.

Any advice would be welcome.

MisterAnderson42 · August 8, 2008, 11:53am

Well, there are the new coalescing rules. Depending on your application’s memory access patter you could make use of those. And the new warp voting feature could potentially save you some cycles if you make any decisions per warp. Or if you have a divergent section of your code, the warp voting could be used to break the divergence.

Since you went from an 8800 GTX to a GTX 280, you also can do interleaved host->device copies and kernel executions using the Stream API. Given that you mention copying and then executing, this could potentially help you out a lot (assuming that your steps are iterative and need to be run in order without interleaving).

Sarnath · August 8, 2008, 12:09pm

This is slightly off-topic to this thread…
But I am just curious…

What is the WARP size in 280?
Do old coalescing rules hold good in 280?
What about shared memory size? Has it also increased?

Any inputs welcome! Thank you guys!

MisterAnderson42 · August 8, 2008, 12:54pm

This is all in the CUDA 2.0b2 programming guide… read there if you want the details.

32
Yes
No

Geka · August 8, 2008, 3:23pm

Page 54 of the programming guide you will find the information about the new coalescing rules.

They also say “compute capability 1.2 and higher”. I wonder if that means cc 1.2 is only that change in coalescing rules. Which would mean that indeed we could see a version with the gtx200 without the double precision fpu’s (ie cc 1.3)

tmurray · August 8, 2008, 3:48pm

Compute capabilities are defined in Appendix A (page 79).

Sarnath · August 11, 2008, 1:08pm

Thanks to all you guys for the replies! Thats comforting!

I can only expect my code to run faster… Thats good news!

Best regards,
Sarnath

huzza · August 11, 2008, 1:13pm

yes, thanks indeed for the replies!