GTX 480 vs GTX 285, less MP more cores

notbusy · July 13, 2010, 2:54am

Hi all,

I have a program which used to run on GTX 285. Recently I got a new GTX 480 and measured the time needed to run on GTX 285 and GTX 480. It turned out GTX 285 is faster. I checked the SPEC of the two GPUs, and think maybe less multiprocessors is the reason GTX 480 needs more time. GTX 480 has 15 MP, 480 cores, and GTX 285 has 30 MP, 240 cores.

The question is how can I modify my program to make it run faster on the “less MP more cores” GTX 480. For example, would it be faster if I try to remove some if…else… control flow statements. Is there any guide to this kind of performance improvement?

Best regards,
ning

ONeill · July 13, 2010, 8:20am

The number of MPs wont be the problem here. I use a GTX 480, too, and had a 285 before. I see my app running with almost double the speed as you would expect it to do (in case its not memory bound) cause the relevant CUDA cores have doubled. Perhaps you want to look into smem accesses and the changed access of memory.

Lev · July 13, 2010, 9:24am

There are a lot of things to tweak and to check.

Sarnath · July 13, 2010, 12:12pm

WarpSize of 64 is a problem for your code?

MisterAnderson42 · July 13, 2010, 12:14pm

I find that tuning the block size is essential on GTX 480. While typical kernels on G200 would only vary 10% performance from the worst block size to the best, the same kernels running on GF100 GPUs have 50+% variation.

cbuchner1 · July 13, 2010, 12:17pm

OMG they killed Kenny! erm… I meant they changed the Warp Size?

Sarnath · July 13, 2010, 12:22pm

Yeah, Isnt FERMI’s warp-size 64?

avidday · July 13, 2010, 12:24pm

No. 32, just like everything since the original G80.

Sarnath · July 14, 2010, 5:05am

Oops… Sorry bout the wrong info… Thanks for correcting, Avid…

Jimmy_Pettersson · July 14, 2010, 7:34am

phew :)

But i guess since having two active warps you NEVER want a block with less than 64 threads (if only one active block)?

ONeill · July 14, 2010, 8:51am

Uhm this will rarely happen. Would mean u have a problem u need less than 64 threads for. If so and if I still had to do this on GPU it wouldnt matter if some threads are idle (except you run other kernels in parallel)…

Nighthawk13 · July 16, 2010, 1:01pm

One possible bottleneck: GF100 has only 15SMs, GT200 had 30. Both support up to 8 blocks per SM, so up to 240 concurrent blocks for GT200 vs. 120 for GF100.

So if you have many small blocks, try to make them larger. Otherwise there are not enough active warps to hide instruction/memory latency.
In my application, increasing block size from 64 to 96 threads improved performance on GTX480.

Additionally, GF100 needs twice as many active warps as the GT200, since one warp is processed in 2 cycles now instead of 4, while the instruction latency stayed the same. (See the FermiTuningGuide pdf for details)

Topic		Replies	Views
GTX280 is slower than 8800GTX ?! CUDA Programming and Performance	17	4538	December 11, 2008
Number of multiprocessors, cores and threads CUDA Programming and Performance	4	15092	December 19, 2010
GTX 480 - performance CUDA Programming and Performance	8	6815	June 9, 2010
GTX 460 - how man angels on the head of a pin how many cores per MP for a GTX 460 - 32 or 48 CUDA Programming and Performance	15	15634	July 18, 2010
GTX460 number of multiprocessors CUDA Programming and Performance	16	10159	September 22, 2010
GTX 285 Performance vs GT 120 CUDA Programming and Performance	9	7827	June 23, 2009
Updating my records CUDA Programming and Performance	7	6679	February 7, 2011
Tesla C2050/GTX 470 limits? CUDA Programming and Performance	19	18723	June 6, 2010
GTX 460: number of cores per multiprocessor? CUDA Programming and Performance	6	10713	July 12, 2010
Faster card for SP GTX 680 or GTX 580 CUDA Programming and Performance	5	1821	May 1, 2012

GTX 480 vs GTX 285, less MP more cores

Related topics