Would a compute bound kernel benefit from more SMs?

StanfordYell · October 30, 2011, 8:47am

I have a compute bound kernel running on a GTX560Ti with it’s 8 SMs. Would an
upgrade to a GTX570 with it’s 15 SMs see an appropriate speed up the kernel
execution speed even though the GFLOPS(FMA) speeds are roughly similar?

The kernel is unavoidably heavy on float arithmetic (__expf() for example).

Simon Greenaway

tera · October 30, 2011, 10:48am

This is difficult to say in general due to the different architectures. The GTX 560 Ti (compute capability 2.1) depends on instruction-level parallelism to feed its 48 “cores” (FPUs) from just 32 threads, while the GTX 570 (CC 2.0) has only 32 cores/FPUs per SM and thus runs better if only thread-level parallelism is available. So, all other things equal, a compute capability 2.0 device run CUDA code anywhere between 0% and 50% faster than a compute capability 2.1 device with the same number of cores. It really depends on the specific code.

If you want to try out your code on a compute capability 2.0 device, you might use a GPU instance on Amazon’s EC2.

StanfordYell · October 31, 2011, 2:18pm

I have a compute capability 2.0 device (A GTX 460) which has the same number of SMs as my 560. It’s speed is slower by approximately the ratios of the GFLOPS(FMA) values for the 460 and 560.

I therefore suspect the different compute capabilities won’t make any difference, as the kernel is tied up the floating point units.

Simon

tera · October 31, 2011, 3:45pm

A GTX 460 would be compute capability 2.1 as well, so it’s no surprise it performed the same.

StanfordYell · November 6, 2011, 9:55am

I purchased a 570, here are the benchmarks for my kernel (compute bound) for my 3 cards:

[font=“Courier New”]
Card SM Threads/SM Cores Clock Seconds Relative Speed Up
GTX 460 7 48 336 675 38.63 0%
GTX 560 Ti 7 48 336 900 19.48 98%
GTX 570 15 32 480 730 10.78 258%
[/font]

Topic		Replies	Views
GTX 460 - how man angels on the head of a pin how many cores per MP for a GTX 460 - 32 or 48 CUDA Programming and Performance	15	15615	July 18, 2010
GeForce GTX 950 vs. 980 Compute Capability CUDA Programming and Performance	1	2580	January 26, 2016
Compute Capability Versions (Geforce 550 Ti and 560 TI) CUDA Programming and Performance	5	2714	December 26, 2011
GTX580 vs GTX680 SP performance CUDA Programming and Performance	1	6685	June 3, 2012
GF100 vs GF104 Performance question CUDA Programming and Performance	18	8920	September 4, 2010
GTX 590 for GPU. any comments? CUDA Programming and Performance	8	18031	November 13, 2011
Reasons why GTX 460 is faster than GTX 480 CUDA Programming and Performance	5	1111	April 5, 2013
GTX 460: number of cores per multiprocessor? CUDA Programming and Performance	6	10707	July 12, 2010
personal PC configuration for GPU computing CUDA Programming and Performance	6	944	September 28, 2011
computation on cuda slower than on cpu CUDA Programming and Performance	3	1937	April 16, 2015

Would a compute bound kernel benefit from more SMs?

Related topics