edit: oops, I think that’s not really true!
Well, that’s still within the capabilities of one 6.4GB/s channel. If everest can do its job and get 80% out of DRAM’s theoretical bandwidth, then that would definately mean only 1 channel is working.
cpu-z says I’m running in dual channels
My ram is a Patriot Extreme Performance Low Latency Kit 2x1GB (PC2-6400).
What should I try now?
Switch slots. Try the sticks individually.
I’ve arranged the sticks so that they were running in single channel and they gave the same exact bandwidth I have in dual channel mode. What gives?
No clue. (But my guess was right :D ). Really, this is about the time you mosey on down to a hardware forum.
Just for everybody information. On a EVGA 790i M/B with three GTX280s installed I get
~5.7…5.8 GiB/s Host to Device (pinned) and ~5.3…5.4 GiB/s Device To Host (pinned)
from two of the three cards. From the third card I got a disappointing
~1.7 GiB/s Host to Device (pinned) and ~1.7 GiB/s Device To Host (pinned)
Not quite clear the reason for this. My hypothesis is that this is because only to of the 3 on board PCI-E slots are Gen2 PCI-E
Just for everybody information. On a EVGA 790i M/B with three GTX280s installed I get
~5.7…5.8 GiB/s Host to Device (pinned) and ~5.3…5.4 GiB/s Device To Host (pinned)
from two of the three cards. From the third card I got a disappointing
~1.7 GiB/s Host to Device (pinned) and ~1.7 GiB/s Device To Host (pinned)
Not quite clear the reason for this. My hypothesis is that this is because only to of the 3 on board PCI-E slots are Gen2 PCI-E
That looks like it’s 16x Gen2/16x Gen2/8x Gen1–if I can find a 790i board and some DDR3 (it’s the latter part that’s a problem) I’ll let you know what it looks like.
I’ve contacted ASUS tech support and they weren’t able to resolve my problems. But I did get the person from TS to check CUDA on his private machine, he was able to get 4.2-4.7 GB/s transfers. Here are screenshots from his machine.
(Warning, big files)
http://img.photobucket.com/albums/v26/_Big…_/inne/1128.jpg
[url=“http://img.photobucket.com/albums/v26/_Big_Mac_/inne/800.jpg”]http://img.photobucket.com/albums/v26/_Big_Mac_/inne/800.jpg[/url]
http://img.photobucket.com/albums/v26/_Big…_/inne/test.jpg
His motherboard is a high-end ASUS Rampage Formula. Apparently my bandwidth (both CPU<->RAM and GPU<->RAM) was about 30% worse than his. And his transfers to/from GPU were still about 20% slower than 5-6 GB/s reported here by other users.
What’s going on? Is it a problem specific to ASUS made mobos? Or is something wrong with my piece?
cern_freak, could you tell us what’s your CPU and RAM (along with timings, frequency etc.)?
Intel Core2 Quad Q9550 @ 2.83GHz
Ram: 2x STT DDR3-1333 2GB/128x8 CL8 Memory (CL 8-8-8-18)
Thank you. It seems you need very fast RAM (1200MHz +) to get max host<->device bandwidth.
Has anyone managed to get more than 4GB/s with 800MHz RAM?
BTW photobucket resized the screencaps I gave in my previous post, this is fixed now. They should be readable.
I just got my new rig. I get 5.7 GB/s Device-to-Host on a P45 chipset (ASUS P5Q Pro), 800MHz DDR2 (4 sticks, 8GB, CL5), Core 2 Duo overclocked to 4.0GHz, GTX260, in Linux x32 (RHEL 5) using 180.06 drivers.
I’d be interested to see a set of unpinned and pinned shmoo results on this rig. I’m especially interested to see how the small block size rates are improving.
Matt
[codebox]
[root@localhost release]# ./bandwidthTest
Running on…
device 0:GeForce GTX 260
Quick Mode
Host to Device Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2780.5
Quick Mode
Device to Host Bandwidth for Pageable memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 2479.6
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 98643.6
&&&& Test PASSED
[/codebox]
[codebox]
[root@localhost release]# ./bandwidthTest --memory=pinned
Running on…
device 0:GeForce GTX 260
Quick Mode
Host to Device Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5253.7
Quick Mode
Device to Host Bandwidth for Pinned memory
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 5674.2
Quick Mode
Device to Device Bandwidth
.
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 98689.3
&&&& Test PASSED
Press ENTER to exit…
[/codebox]
[codebox]
[root@localhost release]# ./bandwidthTest --mode=shmoo
Running on…
device 0:GeForce GTX 260
Shmoo Mode
Host to Device Bandwidth for Pageable memory
…
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 610.4
2048 723.4
3072 1010.2
4096 1775.6
5120 2219.5
6144 1953.1
7168 2531.8
8192 2520.2
9216 2663.4
10240 2639.4
11264 2754.4
12288 2858.2
13312 2885.3
14336 2972.1
15360 2817.0
16384 3125.0
17408 3018.5
18432 3316.6
19456 2945.2
20480 2639.4
22528 3410.2
24576 3551.1
26624 3431.2
28672 3294.4
30720 3291.8
32768 3551.1
34816 3648.7
36864 3662.1
38912 3947.8
40960 4111.8
43008 3475.9
45056 3802.5
47104 3872.6
49152 4223.0
51200 4245.9
61440 4308.4
71680 578.3
81920 1133.9
92160 1231.0
102400 1250.4
204800 1867.2
307200 2192.9
409600 1724.6
512000 2531.3
614400 2620.5
716800 2475.9
819200 1662.9
921600 1880.8
1024000 2067.7
1126400 2678.8
2174976 3078.4
3223552 2706.9
4272128 2564.8
5320704 2283.2
6369280 2368.7
7417856 2590.5
8466432 2391.4
9515008 2693.1
10563584 2557.0
11612160 2713.9
12660736 2330.7
13709312 2742.8
14757888 2735.4
15806464 2707.0
16855040 2641.6
18952192 2589.9
21049344 2482.2
23146496 2534.1
25243648 2776.6
27340800 2535.4
29437952 2675.2
31535104 2650.0
33632256 2696.9
37826560 2634.8
42020864 2689.5
46215168 2594.7
50409472 2679.4
54603776 2635.7
58798080 2678.7
62992384 2701.4
67186688 2691.8
Shmoo Mode
Device to Host Bandwidth for Pageable memory
…
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 21.5
2048 207.8
3072 284.4
4096 365.1
5120 432.1
6144 496.6
7168 560.3
8192 615.2
9216 655.9
10240 702.6
11264 746.0
12288 791.8
13312 835.2
14336 865.3
15360 904.2
16384 947.0
17408 959.6
18432 965.8
19456 1013.9
20480 1044.5
22528 1096.1
24576 1148.9
26624 1170.1
28672 1209.9
30720 1246.7
32768 1275.5
34816 1286.9
36864 1321.7
38912 395.6
40960 1351.6
43008 1367.2
45056 1359.8
47104 1399.4
49152 1429.1
51200 1423.6
61440 1487.2
71680 1748.3
81920 1800.1
92160 1854.2
102400 1896.2
204800 1844.3
307200 1957.0
409600 1294.7
512000 2076.0
614400 1961.0
716800 1964.3
819200 1970.9
921600 1952.3
1024000 1975.6
1126400 1984.1
2174976 2173.6
3223552 1764.2
4272128 2334.0
5320704 2404.7
6369280 2085.5
7417856 2351.3
8466432 2298.1
9515008 2436.6
10563584 2230.0
11612160 2439.1
12660736 2454.2
13709312 2282.7
14757888 2305.8
15806464 2313.6
16855040 2094.0
18952192 2306.1
21049344 2349.3
23146496 2463.6
25243648 2383.3
27340800 2440.4
29437952 2401.5
31535104 2405.1
33632256 2400.0
37826560 2396.8
42020864 2374.1
46215168 2427.3
50409472 2394.7
54603776 2235.0
58798080 2386.8
62992384 2397.7
67186688 2400.6
Shmoo Mode
Device to Device Bandwidth
…
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 310.0
2048 737.0
3072 1085.1
4096 1446.8
5120 1775.6
6144 2092.6
7168 2398.6
8192 2741.2
9216 2881.7
10240 3201.8
11264 3522.0
12288 3842.2
13312 4162.4
14336 4410.3
15360 4725.3
16384 4882.8
17408 5030.8
18432 5247.2
19456 5538.7
20480 5744.5
22528 6227.4
24576 6602.1
26624 6770.8
28672 7291.7
30720 7709.7
32768 8223.7
34816 8098.3
36864 8471.4
38912 8835.6
40960 9191.2
43008 9217.0
45056 9655.9
47104 1220.7
49152 10190.2
51200 10172.5
61440 11160.7
71680 19814.3
81920 21701.4
92160 22828.7
102400 23251.5
204800 36507.0
307200 46136.8
409600 53879.3
512000 58829.1
614400 10751.1
716800 11606.0
819200 69137.2
921600 72041.5
1024000 74263.3
1126400 74598.5
2174976 86067.2
3223552 89366.8
4272128 93125.0
5320704 93706.7
6369280 96111.1
7417856 95597.5
8466432 93722.8
9515008 97050.5
10563584 96496.3
11612160 98525.1
12660736 97885.8
13709312 99348.2
14757888 98146.6
15806464 95618.3
16855040 98675.4
18952192 99747.3
21049344 100170.8
23146496 96288.9
25243648 98362.5
27340800 99330.4
29437952 99167.1
31535104 99468.2
33632256 100388.8
37826560 96921.6
42020864 99898.3
46215168 99862.3
50409472 100996.2
54603776 99132.3
58798080 99856.1
62992384 100905.7
67186688 97325.5
&&&& Test PASSED
Press ENTER to exit…
[/codebox]
[codebox]
[root@localhost release]# ./bandwidthTest --mode=shmoo --memory=pinned
Running on…
device 0:GeForce GTX 260
Shmoo Mode
Host to Device Bandwidth for Pinned memory
…
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 100.7
2048 199.3
3072 302.0
4096 386.8
5120 105.7
6144 580.1
7168 146.7
8192 167.3
9216 813.8
10240 912.7
11264 910.4
12288 1037.1
13312 266.1
14336 1168.5
15360 1241.4
16384 322.8
17408 195.5
18432 1373.3
19456 378.7
20480 398.6
22528 1627.6
24576 1698.4
26624 1826.7
28672 1939.3
30720 576.7
32768 613.9
34816 2142.1
36864 2055.9
38912 2333.9
40960 2411.3
43008 773.9
45056 810.7
47104 2552.4
49152 2633.4
51200 2682.9
61440 2944.4
71680 3135.8
81920 1299.9
92160 1422.2
102400 3538.3
204800 4283.2
307200 4507.2
409600 4678.1
512000 3523.0
614400 3753.6
716800 4942.8
819200 4995.2
921600 4946.0
1024000 5033.8
1126400 5081.5
2174976 4740.0
3223552 4903.8
4272128 5255.7
5320704 5272.5
6369280 5281.5
7417856 5285.6
8466432 5048.0
9515008 5294.8
10563584 5298.9
11612160 5287.5
12660736 5137.5
13709312 5308.0
14757888 5309.8
15806464 5238.1
16855040 5314.0
18952192 5245.0
21049344 5260.1
23146496 5315.2
25243648 5319.8
27340800 5275.6
29437952 5321.3
31535104 5318.0
33632256 5286.0
37826560 5289.9
42020864 5318.1
46215168 5290.8
50409472 5279.9
54603776 5323.6
58798080 5286.2
62992384 5250.6
67186688 5250.9
Shmoo Mode
Device to Host Bandwidth for Pinned memory
…
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 114.9
2048 219.5
3072 332.9
4096 434.0
5120 536.6
6144 636.9
7168 712.1
8192 805.4
9216 896.8
10240 986.4
11264 1063.6
12288 1137.7
13312 1197.7
14336 1277.7
15360 1356.3
16384 1407.7
17408 1495.6
18432 1515.4
19456 1572.4
20480 1537.9
22528 1775.6
24576 1905.5
26624 161.5
28672 2103.4
30720 2170.1
32768 2281.0
34816 2371.7
36864 2408.0
38912 2524.4
40960 2604.2
43008 2680.8
45056 2719.5
47104 2773.0
49152 2840.9
51200 2906.4
61440 3167.2
71680 3367.5
81920 3535.1
92160 3692.9
102400 3844.7
204800 4500.3
307200 4818.6
409600 5008.0
512000 5129.0
614400 5222.3
716800 5278.7
819200 5332.8
921600 5369.0
1024000 5398.4
1126400 5419.9
2174976 5547.5
3223552 5584.4
4272128 5583.4
5320704 5545.0
6369280 4393.3
7417856 5573.8
8466432 5591.2
9515008 5595.5
10563584 4798.2
11612160 5614.6
12660736 5611.2
13709312 5618.7
14757888 5631.0
15806464 5211.3
16855040 5500.5
18952192 4663.4
21049344 5428.8
23146496 5551.9
25243648 5381.5
27340800 5494.7
29437952 5423.8
31535104 5517.1
33632256 5527.6
37826560 5467.9
42020864 4726.9
46215168 5619.2
50409472 5583.6
54603776 5464.2
58798080 5556.3
62992384 5565.2
67186688 5604.9
Shmoo Mode
Device to Device Bandwidth
…
…
Transfer Size (Bytes) Bandwidth(MB/s)
1024 315.0
2048 737.0
3072 1085.1
4096 1446.8
5120 1775.6
6144 2130.7
7168 2398.6
8192 2694.0
9216 2881.7
10240 3201.8
11264 3580.7
12288 3906.2
13312 4162.4
14336 4410.3
15360 4725.3
16384 5040.3
17408 5030.8
18432 5247.2
19456 5622.6
20480 5830.2
22528 6227.4
24576 6602.1
26624 6862.3
28672 7491.4
30720 7812.5
32768 8223.7
34816 8300.8
36864 8471.4
38912 9051.1
40960 9300.6
43008 9428.9
45056 9765.6
47104 9982.6
49152 10302.2
51200 10279.6
61440 11268.0
71680 19531.2
81920 21701.4
92160 23129.1
102400 23531.6
204800 36169.0
307200 46503.0
409600 53510.3
512000 59185.6
614400 63344.6
716800 65730.2
819200 70067.3
921600 72338.0
1024000 74263.3
1126400 74340.4
2174976 85889.0
3223552 89107.8
4272128 93125.0
5320704 93880.1
6369280 96111.1
7417856 95597.5
8466432 93831.7
9515008 96946.8
10563584 96542.6
11612160 98525.1
12660736 97965.3
13709312 99385.9
14757888 98180.8
15806464 95618.3
16855040 98645.1
18952192 99802.4
21049344 100195.8
23146496 96204.9
25243648 85918.0
27340800 99292.5
29437952 99237.2
31535104 99501.1
33632256 100404.5
37826560 96934.6
42020864 99923.2
46215168 99862.3
50409472 100975.0
54603776 99151.2
58798080 99838.4
62992384 100871.8
67186688 97266.4
&&&& Test PASSED
Press ENTER to exit…
[/codebox]
The small block size results are a little hard to believe (5KB transfers at 2.2 GB/s?), and there are a lot of dips that I’d like to understand.
wow, small size transfers are slower on pinned memory! 1024 bytes are 610 MB/s vs 100 MB/s.
Those pinned memory at low block sizes look suspicious. I don’t see that kind of difference in any other results, mine or these http://forums.nvidia.com/index.php?showtopic=68266.
It’s encouraging for me that memory transfer speeds at small buffer sizes seem to be improving well, mine are ~10MB/s, the 8800GTS sits in the middle of ours :) Thanks for posting.
Now I’m stumped, we have the same motherboard and yet I have barely over half the bandwidth. My RAM is also 800MHz CL5 only I use 2 sticks with 2GB total (which should actually be a tiny bit faster AFAIK).
My CPU is clocked at a stock 2.5 GHz (Intel E5200) and I’m using WinXP32 @ 177.92 drivers. Could a slower CPU make such a difference? Would you have bandwithTest reports from before you overclocked your CPU? (Or if you were so kind, could you bring it back to a stock clock for one boot and launch bandwdthTest)
Maybe you have your memory sticks in the same channel? When you have 4 slots for memory, you have to make sure you put the sticks in the right slots to get dual channel performance.
At stock 2.66GHz, I go down to 4.6 GB/s Device to Host. Interesting. Btw, I think we’ve established your 3.4 GB/s is from single-channel memory. Did you ever try swapping your ddr stick orientation?