Bandwith Problem

My Card is (from ./deviceQuery )
Device 0: “GeForce 9800M GTX”

Major revision number: 1
Minor revision number: 1
Total amount of global memory: 1073414144 bytes
Number of multiprocessors: 14
Number of cores: 112
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 0.55 GHz
Concurrent copy and execution: Yes

As I found out (here in the forum) the device-to-device bandwith should be in the range of about 50GB/s.

But I get (bandwidthTest):

Host to Device Bandwidth for Pageable memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 691.3

Device to Host Bandwidth for Pageable memory
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 965.5

Device to Device Bandwidth
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 15082.6

A device-ot-device bandwidth of just 15GB/s, which is 30% of what it should be !
Also, the host-to-device and device-to-host transfer is really slow !

Any help ?

No idea how mobile parts work, but that’s in some crazy low power mode (look at the clock rate).

I dont think its a problem of the clock rate:
at [url=“http://www.gpureview.com/geforce-9800m-gtx-card-588.html#”]http://www.gpureview.com/geforce-9800m-gtx-card-588.html#[/url]
they state it should be 500Mhz , but a with a bandwith of 51.2GB/s !

No, they state that the shader clock should be 1250MHz. Also, the bandwidthTest measures copying from A to B issuing both a read and a write for each operation. Actual bandwidth is therefore twice the reported thruput.

So the clock rate (Clock rate: 0.55 GHz) reported from ./deviceQuery is the “shader clock rate” ?

If this is true it should be 1250MHz and NOT 500 MHz.

Any Idea why this value is too small, or how to change that ??

In the “Nvidia X Server Settings window” there are 4 performence levels:

0 - NV Clock=200Mhz - MemoryClock = 100Mhz

1 - NV Clock=275Mhz - MemoryClock = 301Mhz

2 - NV Clock=383Mhz - MemoryClock = 301Mhz

3 - NV Clock=500Mhz - MemoryClock = 799Mhz

Level 0 is active (this is default)

If I start a cuda application → the performance level increases to level 1 and goes then back to level 0 after the

cuda application is finished .

1.) Does anyone know how this correspondes to the shader clock rate ?

2.) Does anyone know how to set the performance level to the maximal level (3) ?

  1. The shader clock is linked to the GPU clock.

  2. http://tutanhamon.com.ua/technovodstvo/NVIDIA-UNIX-driver/

I now localized the probelm: It is the PowerMizer, which is not working on the 9800M GTX.

This slows down the graphics hardware, so CUDA is NOT useable on mobile devices (crap!).

See the discussion in: http://www.nvnews.net/vbulletin/showthread.php?t=124153