GTX 480 - performance

Yunior · June 2, 2010, 8:22am

I have just bought a new machine in order to take advantage of FERMI:

GTX 480 (installed driver version 197.75) and I have few questions about it performance that I have already tested:

[1] I run some computations on matrix times vector operation and I noticed that there is about 1.7 speedup against GTX 285 which was predictable because (GTX 480 = 1532 = 480cores, has 2x more cores than GTX 285 = 308 = 240 cores, but…

When I checked some operation from CUBLAS I found that the performance on GTX 480 is 10x worse than on GTX 285 ?! ANY IDEAS WHY ?

[2] What surprise me is a print from deviceQuery:

There is 1 device supporting CUDA

Device 0: “GeForce GTX 480”
CUDA Driver Version: 3.0
CUDA Runtime Version: 3.0
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1576468480 bytes
Number of multiprocessors: 15
Number of cores: 480
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.81 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads
can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.0, CUDA Runtime Versi
on = 3.0, NumDevs = 1, Device = GeForce GTX 480

Firstly, I expected (from WhitePaper of Fermi) that there will be 16 multiporcesors of 32 cores and it will be 512 cores
Secondly, Clock rate iin printf is 0.81 GHz, but in CUDA Control Panel → System Information there is Graphics Clock 700MHz and Processor Clock 1401MHz - which of those 3 values is a clock rate of my card ? (What surprise me - GTX 285 had better clock rate 1.48GHz!)

Thanks for any reply,
Yunior

LSChien · June 6, 2010, 3:57pm

Could you specify which function you use in CUBLAS and what is the dimension of matrics/vectors you test on Fermi and GT200?

ymc · June 6, 2010, 6:51pm

“Firstly, I expected (from WhitePaper of Fermi) that there will be 16 multiporcesors of 32 cores and it will be 512 cores
Secondly, Clock rate iin printf is 0.81 GHz, but in CUDA Control Panel → System Information there is Graphics Clock 700MHz and Processor Clock 1401MHz - which of those 3 values is a clock rate of my card ? (What surprise me - GTX 285 had better clock rate 1.48GHz!)”

Fermi is a theoretical GPU. GTX 480 was advertised to have 15 SM and 480 cores.

1.48GHz is the shader clock rate of GTX 285. It is a bit odd that your 480 shows 0.81GHz :mellow:

seibert · June 6, 2010, 7:28pm

Fermi is a real GPU, but it has yield problems. By lowering the spec for the GTX 480 to 15 SMs, they can ship GPUs that have a flaw in one of the 16 SMs by disabling it. This is a standard technique to make use of imperfect chips. Similarly, the GTX 470 can use chips with two broken SMs.

Edit: I should emphasize that, yes, the GTX 480 spec was officially advertised with 15 SM before they went on sale, of course.

Yes, the 480 should have a clock rate of 1.401 GHz. This looks like the driver has the card stuck in low power mode.

Yunior · June 8, 2010, 11:34am

Functions such as: cublasSaxpy , cublasScopy , cublasSdot

Vector size: [500 000 x 1]

Sarnath · June 8, 2010, 11:45am

clkRate can be changed by some “power save” features etc… Long back, this was discussd in forums…

r u running linux?

r u using the CUBLAS that is shipping with CUDA 3.0?

Yunior · June 8, 2010, 8:57pm

I take advantage of Nvidia System Monitor and I noticed that at the begining of the kernel (matrix multiplication) clock rate is about 0.8GHz, and later when it warmed-up it achieves 1.4GHz and the GPU usage is 99-100% - so matter of clock rate is more or less solved.

But there is still problem with CUBLAS. When I execute function from this library GPU usage (shown by Nvidia System Monitor) is about 66%… . What I have done is : I installed the newest driver, toolkit and sdk from (http://developer.nvidia.com/object/cuda_3_0_downloads.html), and in Project Properties-> Linker I add cublas.lib.

Should I do something more to ship it with CUDA 3.0??

Yunior

Sarnath · June 9, 2010, 5:19am

Interesting… Can you run the bandwidth test and post the results??? May be, something to do with memory transfer?

Yunior · June 9, 2010, 7:12am

My device driver is 197.75

Results from Bandwidth test:

Device 0: GeForce GTX 480

Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4654.4

Device to Host Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4549.9

Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 112037.3

[bandwidthTest] - Test results:

PASSED

Topic		Replies	Views
Disappointed performance using C2050 CUDA Programming and Performance	20	7916	September 2, 2010
GTX480 performance on different motherboards performance differs on AMD and INTEL motherboards CUDA Programming and Performance	15	18486	June 7, 2010
GTX460 number of multiprocessors CUDA Programming and Performance	16	10237	September 22, 2010
GeForce GTX 460 & CUDA 3.1 (What is deviceQuery reporting?) CUDA Programming and Performance	8	10946	August 15, 2010
Low clock speed on my gtx480 Trying to see whether my gtx480 is power starved or something is off CUDA Programming and Performance	2	2163	September 17, 2010
GTX 460 - how man angels on the head of a pin how many cores per MP for a GTX 460 - 32 or 48 CUDA Programming and Performance	15	15724	July 18, 2010
GTX 580 is not as good as GTX480 for CUDA ? CUDA Programming and Performance	23	4045	November 7, 2010
[SDK] GTX 480 vs GTX 280 Performance issues CUDA Programming and Performance	2	7925	May 3, 2010
Need help to choose either the gtx 295 or the gtx 480 for massive Lattice Boltzman simulations CUDA Programming and Performance	10	1395	December 9, 2010
Fermi Card Performance Differences CUDA Programming and Performance	7	1266	November 18, 2010

GTX 480 - performance

Related topics