GTX 480 - performance

I have just bought a new machine in order to take advantage of FERMI:

GTX 480 (installed driver version 197.75) and I have few questions about it performance that I have already tested:

[1] I run some computations on matrix times vector operation and I noticed that there is about 1.7 speedup against GTX 285 which was predictable because (GTX 480 = 1532 = 480cores, has 2x more cores than GTX 285 = 308 = 240 cores, but…

When I checked some operation from CUBLAS I found that the performance on GTX 480 is 10x worse than on GTX 285 ?! ANY IDEAS WHY ?

[2] What surprise me is a print from deviceQuery:

There is 1 device supporting CUDA

Device 0: “GeForce GTX 480”
CUDA Driver Version: 3.0
CUDA Runtime Version: 3.0
CUDA Capability Major revision number: 2
CUDA Capability Minor revision number: 0
Total amount of global memory: 1576468480 bytes
Number of multiprocessors: 15
Number of cores: 480
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 32768
Warp size: 32
Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Clock rate: 0.81 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: Yes
Compute mode: Default (multiple host threads
can use this device simultaneously)

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 3.0, CUDA Runtime Versi
on = 3.0, NumDevs = 1, Device = GeForce GTX 480

Firstly, I expected (from WhitePaper of Fermi) that there will be 16 multiporcesors of 32 cores and it will be 512 cores
Secondly, Clock rate iin printf is 0.81 GHz, but in CUDA Control Panel -> System Information there is Graphics Clock 700MHz and Processor Clock 1401MHz - which of those 3 values is a clock rate of my card ? (What surprise me - GTX 285 had better clock rate 1.48GHz!)

Thanks for any reply,
Yunior

Could you specify which function you use in CUBLAS and what is the dimension of matrics/vectors you test on Fermi and GT200?

“Firstly, I expected (from WhitePaper of Fermi) that there will be 16 multiporcesors of 32 cores and it will be 512 cores
Secondly, Clock rate iin printf is 0.81 GHz, but in CUDA Control Panel -> System Information there is Graphics Clock 700MHz and Processor Clock 1401MHz - which of those 3 values is a clock rate of my card ? (What surprise me - GTX 285 had better clock rate 1.48GHz!)”

Fermi is a theoretical GPU. GTX 480 was advertised to have 15 SM and 480 cores.

1.48GHz is the shader clock rate of GTX 285. It is a bit odd that your 480 shows 0.81GHz :mellow:

Fermi is a real GPU, but it has yield problems. By lowering the spec for the GTX 480 to 15 SMs, they can ship GPUs that have a flaw in one of the 16 SMs by disabling it. This is a standard technique to make use of imperfect chips. Similarly, the GTX 470 can use chips with two broken SMs.

Edit: I should emphasize that, yes, the GTX 480 spec was officially advertised with 15 SM before they went on sale, of course.

Yes, the 480 should have a clock rate of 1.401 GHz. This looks like the driver has the card stuck in low power mode.

Functions such as: cublasSaxpy , cublasScopy , cublasSdot

Vector size: [500 000 x 1]

clkRate can be changed by some “power save” features etc… Long back, this was discussd in forums…

r u running linux?

r u using the CUBLAS that is shipping with CUDA 3.0?

I take advantage of Nvidia System Monitor and I noticed that at the begining of the kernel (matrix multiplication) clock rate is about 0.8GHz, and later when it warmed-up it achieves 1.4GHz and the GPU usage is 99-100% - so matter of clock rate is more or less solved.

But there is still problem with CUBLAS. When I execute function from this library GPU usage (shown by Nvidia System Monitor) is about 66%… . What I have done is : I installed the newest driver, toolkit and sdk from (http://developer.nvidia.com/object/cuda_3_0_downloads.html), and in Project Properties-> Linker I add cublas.lib.

Should I do something more to ship it with CUDA 3.0??

Yunior

Interesting… Can you run the bandwidth test and post the results??? May be, something to do with memory transfer?

My device driver is 197.75

Results from Bandwidth test:

Device 0: GeForce GTX 480

Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4654.4

Device to Host Bandwidth, 1 Device(s), Paged memory

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 4549.9

Device to Device Bandwidth, 1 Device(s)

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 112037.3

[bandwidthTest] - Test results:

PASSED