My Card is (from ./deviceQuery )
Device 0: “GeForce 9800M GTX”
Major revision number: 1
Minor revision number: 1
Total amount of global memory: 1073414144 bytes
Number of multiprocessors: 14
Number of cores: 112
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.55 GHz
Concurrent copy and execution: Yes
As I found out (here in the forum) the device-to-device bandwith should be in the range of about 50GB/s.
But I get (bandwidthTest):
Host to Device Bandwidth for Pageable memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 691.3
Device to Host Bandwidth for Pageable memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 965.5
Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 15082.6
A device-ot-device bandwidth of just 15GB/s, which is 30% of what it should be !
Also, the host-to-device and device-to-host transfer is really slow !
No, they state that the shader clock should be 1250MHz. Also, the bandwidthTest measures copying from A to B issuing both a read and a write for each operation. Actual bandwidth is therefore twice the reported thruput.