I built a new linux box using centos (based on RHEL 4.5). All the CUDA examples work OK as does our newly ported (from windows) CUDA application.
I am seeing a problem though with memory bandwidth between the G80 and the CPU.
The system is an older (3 years old) motherboard with PCI express x16 and intel 945 chipset using an 8800GTS and I would expect to get better bandwidth results.
I am seeing 2 issues:
- Speed. I expect twice the performance compared to numbers I am seeing. I get about twice the bandwidth using windows on the same machine.
- Consistency. Transfer sizes under 700k vary greatly, i.e. notice below how the bandwidth goes up and down even as transfer sizes that increase.
When I run the bandwidthtest example, I get the following results:
./bandwidthTest --memory=pinned --mode=shmoo
Shmoo Mode
Host to Device Bandwidth for Pinned memory
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 32.7
2048 66.0
3072 97.3
4096 79.9
5120 158.5
6144 184.8
7168 215.0
8192 154.7
9216 267.1
10240 286.4
11264 310.5
12288 128.2
13312 358.6
14336 379.8
15360 264.9
16384 420.0
17408 442.7
18432 450.7
19456 323.8
20480 489.5
22528 371.7
24576 150.9
26624 116.2
28672 192.0
30720 461.4
32768 484.5
34816 708.0
36864 743.3
38912 752.7
40960 556.4
43008 802.7
45056 836.0
47104 621.3
49152 866.5
51200 904.2
61440 996.5
71680 269.6
81920 303.5
92160 390.3
102400 826.2
204800 1041.7
307200 1209.1
409600 1537.9
512000 1696.0
614400 1642.7
716800 1740.3
819200 1774.4
921600 1793.7
1024000 1742.3
…
67186688 1902.0
Shmoo Mode
Device to Host Bandwidth for Pinned memory
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 38.3
2048 74.0
3072 109.7
4096 143.1
5120 176.9
6144 85.7
7168 239.9
8192 263.0
9216 291.0
10240 317.1
11264 346.5
12288 361.7
13312 381.2
14336 400.9
15360 442.6
16384 463.6
17408 489.7
18432 509.5
19456 530.1
20480 551.7
22528 585.4
24576 620.0
26624 651.0
28672 687.0
30720 712.8
32768 742.3
34816 772.2
36864 791.8
38912 90.0
40960 157.6
43008 878.3
45056 900.8
47104 920.5
49152 935.6
51200 955.5
61440 1033.4
…
62992384 1758.7
67186688 1758.6
Any ideas? Anyone else seeing these problems?