Is this PCIe 2.0 bandwidth low? 3.1 GB/s pinned

edit: oops, I think that’s not really true!

Well, that’s still within the capabilities of one 6.4GB/s channel. If everest can do its job and get 80% out of DRAM’s theoretical bandwidth, then that would definately mean only 1 channel is working.

cpu-z says I’m running in dual channels

External Media

My ram is a Patriot Extreme Performance Low Latency Kit 2x1GB (PC2-6400).

What should I try now?

Switch slots. Try the sticks individually.

I’ve arranged the sticks so that they were running in single channel and they gave the same exact bandwidth I have in dual channel mode. What gives?

No clue. (But my guess was right :D ). Really, this is about the time you mosey on down to a hardware forum.

Just for everybody information. On a EVGA 790i M/B with three GTX280s installed I get

~5.7…5.8 GiB/s Host to Device (pinned) and ~5.3…5.4 GiB/s Device To Host (pinned)

from two of the three cards. From the third card I got a disappointing

~1.7 GiB/s Host to Device (pinned) and ~1.7 GiB/s Device To Host (pinned)

Not quite clear the reason for this. My hypothesis is that this is because only to of the 3 on board PCI-E slots are Gen2 PCI-E

Just for everybody information. On a EVGA 790i M/B with three GTX280s installed I get

~5.7…5.8 GiB/s Host to Device (pinned) and ~5.3…5.4 GiB/s Device To Host (pinned)

from two of the three cards. From the third card I got a disappointing

~1.7 GiB/s Host to Device (pinned) and ~1.7 GiB/s Device To Host (pinned)

Not quite clear the reason for this. My hypothesis is that this is because only to of the 3 on board PCI-E slots are Gen2 PCI-E

That looks like it’s 16x Gen2/16x Gen2/8x Gen1–if I can find a 790i board and some DDR3 (it’s the latter part that’s a problem) I’ll let you know what it looks like.

I’ve contacted ASUS tech support and they weren’t able to resolve my problems. But I did get the person from TS to check CUDA on his private machine, he was able to get 4.2-4.7 GB/s transfers. Here are screenshots from his machine.
(Warning, big files)
http://img.photobucket.com/albums/v26/_Big…_/inne/1128.jpg
[url=“http://img.photobucket.com/albums/v26/_Big_Mac_/inne/800.jpg”]http://img.photobucket.com/albums/v26/_Big_Mac_/inne/800.jpg[/url]
http://img.photobucket.com/albums/v26/_Big…_/inne/test.jpg

His motherboard is a high-end ASUS Rampage Formula. Apparently my bandwidth (both CPU<->RAM and GPU<->RAM) was about 30% worse than his. And his transfers to/from GPU were still about 20% slower than 5-6 GB/s reported here by other users.

What’s going on? Is it a problem specific to ASUS made mobos? Or is something wrong with my piece?

cern_freak, could you tell us what’s your CPU and RAM (along with timings, frequency etc.)?

Intel Core2 Quad Q9550 @ 2.83GHz

Ram: 2x STT DDR3-1333 2GB/128x8 CL8 Memory (CL 8-8-8-18)

Thank you. It seems you need very fast RAM (1200MHz +) to get max host<->device bandwidth.

Has anyone managed to get more than 4GB/s with 800MHz RAM?

BTW photobucket resized the screencaps I gave in my previous post, this is fixed now. They should be readable.

I just got my new rig. I get 5.7 GB/s Device-to-Host on a P45 chipset (ASUS P5Q Pro), 800MHz DDR2 (4 sticks, 8GB, CL5), Core 2 Duo overclocked to 4.0GHz, GTX260, in Linux x32 (RHEL 5) using 180.06 drivers.

I’d be interested to see a set of unpinned and pinned shmoo results on this rig. I’m especially interested to see how the small block size rates are improving.

Matt

[codebox]

[root@localhost release]# ./bandwidthTest

Running on…

  device 0:GeForce GTX 260

Quick Mode

Host to Device Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 2780.5

Quick Mode

Device to Host Bandwidth for Pageable memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 2479.6

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 98643.6

&&&& Test PASSED

[/codebox]

[codebox]

[root@localhost release]# ./bandwidthTest --memory=pinned

Running on…

  device 0:GeForce GTX 260

Quick Mode

Host to Device Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5253.7

Quick Mode

Device to Host Bandwidth for Pinned memory

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 5674.2

Quick Mode

Device to Device Bandwidth

.

Transfer Size (Bytes) Bandwidth(MB/s)

33554432 98689.3

&&&& Test PASSED

Press ENTER to exit…

[/codebox]

[codebox]

[root@localhost release]# ./bandwidthTest --mode=shmoo

Running on…

  device 0:GeForce GTX 260

Shmoo Mode

Host to Device Bandwidth for Pageable memory

Transfer Size (Bytes) Bandwidth(MB/s)

 1024               610.4

 2048               723.4

 3072               1010.2

 4096               1775.6

 5120               2219.5

 6144               1953.1

 7168               2531.8

 8192               2520.2

 9216               2663.4

10240               2639.4

11264               2754.4

12288               2858.2

13312               2885.3

14336               2972.1

15360               2817.0

16384               3125.0

17408               3018.5

18432               3316.6

19456               2945.2

20480               2639.4

22528               3410.2

24576               3551.1

26624               3431.2

28672               3294.4

30720               3291.8

32768               3551.1

34816               3648.7

36864               3662.1

38912               3947.8

40960               4111.8

43008               3475.9

45056               3802.5

47104               3872.6

49152               4223.0

51200               4245.9

61440               4308.4

71680               578.3

81920               1133.9

92160               1231.0

102400 1250.4

204800 1867.2

307200 2192.9

409600 1724.6

512000 2531.3

614400 2620.5

716800 2475.9

819200 1662.9

921600 1880.8

1024000 2067.7

1126400 2678.8

2174976 3078.4

3223552 2706.9

4272128 2564.8

5320704 2283.2

6369280 2368.7

7417856 2590.5

8466432 2391.4

9515008 2693.1

10563584 2557.0

11612160 2713.9

12660736 2330.7

13709312 2742.8

14757888 2735.4

15806464 2707.0

16855040 2641.6

18952192 2589.9

21049344 2482.2

23146496 2534.1

25243648 2776.6

27340800 2535.4

29437952 2675.2

31535104 2650.0

33632256 2696.9

37826560 2634.8

42020864 2689.5

46215168 2594.7

50409472 2679.4

54603776 2635.7

58798080 2678.7

62992384 2701.4

67186688 2691.8

Shmoo Mode

Device to Host Bandwidth for Pageable memory

Transfer Size (Bytes) Bandwidth(MB/s)

 1024               21.5

 2048               207.8

 3072               284.4

 4096               365.1

 5120               432.1

 6144               496.6

 7168               560.3

 8192               615.2

 9216               655.9

10240               702.6

11264               746.0

12288               791.8

13312               835.2

14336               865.3

15360               904.2

16384               947.0

17408               959.6

18432               965.8

19456               1013.9

20480               1044.5

22528               1096.1

24576               1148.9

26624               1170.1

28672               1209.9

30720               1246.7

32768               1275.5

34816               1286.9

36864               1321.7

38912               395.6

40960               1351.6

43008               1367.2

45056               1359.8

47104               1399.4

49152               1429.1

51200               1423.6

61440               1487.2

71680               1748.3

81920               1800.1

92160               1854.2

102400 1896.2

204800 1844.3

307200 1957.0

409600 1294.7

512000 2076.0

614400 1961.0

716800 1964.3

819200 1970.9

921600 1952.3

1024000 1975.6

1126400 1984.1

2174976 2173.6

3223552 1764.2

4272128 2334.0

5320704 2404.7

6369280 2085.5

7417856 2351.3

8466432 2298.1

9515008 2436.6

10563584 2230.0

11612160 2439.1

12660736 2454.2

13709312 2282.7

14757888 2305.8

15806464 2313.6

16855040 2094.0

18952192 2306.1

21049344 2349.3

23146496 2463.6

25243648 2383.3

27340800 2440.4

29437952 2401.5

31535104 2405.1

33632256 2400.0

37826560 2396.8

42020864 2374.1

46215168 2427.3

50409472 2394.7

54603776 2235.0

58798080 2386.8

62992384 2397.7

67186688 2400.6

Shmoo Mode

Device to Device Bandwidth

Transfer Size (Bytes) Bandwidth(MB/s)

 1024               310.0

 2048               737.0

 3072               1085.1

 4096               1446.8

 5120               1775.6

 6144               2092.6

 7168               2398.6

 8192               2741.2

 9216               2881.7

10240               3201.8

11264               3522.0

12288               3842.2

13312               4162.4

14336               4410.3

15360               4725.3

16384               4882.8

17408               5030.8

18432               5247.2

19456               5538.7

20480               5744.5

22528               6227.4

24576               6602.1

26624               6770.8

28672               7291.7

30720               7709.7

32768               8223.7

34816               8098.3

36864               8471.4

38912               8835.6

40960               9191.2

43008               9217.0

45056               9655.9

47104               1220.7

49152               10190.2

51200               10172.5

61440               11160.7

71680               19814.3

81920               21701.4

92160               22828.7

102400 23251.5

204800 36507.0

307200 46136.8

409600 53879.3

512000 58829.1

614400 10751.1

716800 11606.0

819200 69137.2

921600 72041.5

1024000 74263.3

1126400 74598.5

2174976 86067.2

3223552 89366.8

4272128 93125.0

5320704 93706.7

6369280 96111.1

7417856 95597.5

8466432 93722.8

9515008 97050.5

10563584 96496.3

11612160 98525.1

12660736 97885.8

13709312 99348.2

14757888 98146.6

15806464 95618.3

16855040 98675.4

18952192 99747.3

21049344 100170.8

23146496 96288.9

25243648 98362.5

27340800 99330.4

29437952 99167.1

31535104 99468.2

33632256 100388.8

37826560 96921.6

42020864 99898.3

46215168 99862.3

50409472 100996.2

54603776 99132.3

58798080 99856.1

62992384 100905.7

67186688 97325.5

&&&& Test PASSED

Press ENTER to exit…

[/codebox]

[codebox]

[root@localhost release]# ./bandwidthTest --mode=shmoo --memory=pinned

Running on…

  device 0:GeForce GTX 260

Shmoo Mode

Host to Device Bandwidth for Pinned memory

Transfer Size (Bytes) Bandwidth(MB/s)

 1024               100.7

 2048               199.3

 3072               302.0

 4096               386.8

 5120               105.7

 6144               580.1

 7168               146.7

 8192               167.3

 9216               813.8

10240               912.7

11264               910.4

12288               1037.1

13312               266.1

14336               1168.5

15360               1241.4

16384               322.8

17408               195.5

18432               1373.3

19456               378.7

20480               398.6

22528               1627.6

24576               1698.4

26624               1826.7

28672               1939.3

30720               576.7

32768               613.9

34816               2142.1

36864               2055.9

38912               2333.9

40960               2411.3

43008               773.9

45056               810.7

47104               2552.4

49152               2633.4

51200               2682.9

61440               2944.4

71680               3135.8

81920               1299.9

92160               1422.2

102400 3538.3

204800 4283.2

307200 4507.2

409600 4678.1

512000 3523.0

614400 3753.6

716800 4942.8

819200 4995.2

921600 4946.0

1024000 5033.8

1126400 5081.5

2174976 4740.0

3223552 4903.8

4272128 5255.7

5320704 5272.5

6369280 5281.5

7417856 5285.6

8466432 5048.0

9515008 5294.8

10563584 5298.9

11612160 5287.5

12660736 5137.5

13709312 5308.0

14757888 5309.8

15806464 5238.1

16855040 5314.0

18952192 5245.0

21049344 5260.1

23146496 5315.2

25243648 5319.8

27340800 5275.6

29437952 5321.3

31535104 5318.0

33632256 5286.0

37826560 5289.9

42020864 5318.1

46215168 5290.8

50409472 5279.9

54603776 5323.6

58798080 5286.2

62992384 5250.6

67186688 5250.9

Shmoo Mode

Device to Host Bandwidth for Pinned memory

Transfer Size (Bytes) Bandwidth(MB/s)

 1024               114.9

 2048               219.5

 3072               332.9

 4096               434.0

 5120               536.6

 6144               636.9

 7168               712.1

 8192               805.4

 9216               896.8

10240               986.4

11264               1063.6

12288               1137.7

13312               1197.7

14336               1277.7

15360               1356.3

16384               1407.7

17408               1495.6

18432               1515.4

19456               1572.4

20480               1537.9

22528               1775.6

24576               1905.5

26624               161.5

28672               2103.4

30720               2170.1

32768               2281.0

34816               2371.7

36864               2408.0

38912               2524.4

40960               2604.2

43008               2680.8

45056               2719.5

47104               2773.0

49152               2840.9

51200               2906.4

61440               3167.2

71680               3367.5

81920               3535.1

92160               3692.9

102400 3844.7

204800 4500.3

307200 4818.6

409600 5008.0

512000 5129.0

614400 5222.3

716800 5278.7

819200 5332.8

921600 5369.0

1024000 5398.4

1126400 5419.9

2174976 5547.5

3223552 5584.4

4272128 5583.4

5320704 5545.0

6369280 4393.3

7417856 5573.8

8466432 5591.2

9515008 5595.5

10563584 4798.2

11612160 5614.6

12660736 5611.2

13709312 5618.7

14757888 5631.0

15806464 5211.3

16855040 5500.5

18952192 4663.4

21049344 5428.8

23146496 5551.9

25243648 5381.5

27340800 5494.7

29437952 5423.8

31535104 5517.1

33632256 5527.6

37826560 5467.9

42020864 4726.9

46215168 5619.2

50409472 5583.6

54603776 5464.2

58798080 5556.3

62992384 5565.2

67186688 5604.9

Shmoo Mode

Device to Device Bandwidth

Transfer Size (Bytes) Bandwidth(MB/s)

 1024               315.0

 2048               737.0

 3072               1085.1

 4096               1446.8

 5120               1775.6

 6144               2130.7

 7168               2398.6

 8192               2694.0

 9216               2881.7

10240               3201.8

11264               3580.7

12288               3906.2

13312               4162.4

14336               4410.3

15360               4725.3

16384               5040.3

17408               5030.8

18432               5247.2

19456               5622.6

20480               5830.2

22528               6227.4

24576               6602.1

26624               6862.3

28672               7491.4

30720               7812.5

32768               8223.7

34816               8300.8

36864               8471.4

38912               9051.1

40960               9300.6

43008               9428.9

45056               9765.6

47104               9982.6

49152               10302.2

51200               10279.6

61440               11268.0

71680               19531.2

81920               21701.4

92160               23129.1

102400 23531.6

204800 36169.0

307200 46503.0

409600 53510.3

512000 59185.6

614400 63344.6

716800 65730.2

819200 70067.3

921600 72338.0

1024000 74263.3

1126400 74340.4

2174976 85889.0

3223552 89107.8

4272128 93125.0

5320704 93880.1

6369280 96111.1

7417856 95597.5

8466432 93831.7

9515008 96946.8

10563584 96542.6

11612160 98525.1

12660736 97965.3

13709312 99385.9

14757888 98180.8

15806464 95618.3

16855040 98645.1

18952192 99802.4

21049344 100195.8

23146496 96204.9

25243648 85918.0

27340800 99292.5

29437952 99237.2

31535104 99501.1

33632256 100404.5

37826560 96934.6

42020864 99923.2

46215168 99862.3

50409472 100975.0

54603776 99151.2

58798080 99838.4

62992384 100871.8

67186688 97266.4

&&&& Test PASSED

Press ENTER to exit…

[/codebox]

The small block size results are a little hard to believe (5KB transfers at 2.2 GB/s?), and there are a lot of dips that I’d like to understand.

wow, small size transfers are slower on pinned memory! 1024 bytes are 610 MB/s vs 100 MB/s.

Those pinned memory at low block sizes look suspicious. I don’t see that kind of difference in any other results, mine or these http://forums.nvidia.com/index.php?showtopic=68266.

It’s encouraging for me that memory transfer speeds at small buffer sizes seem to be improving well, mine are ~10MB/s, the 8800GTS sits in the middle of ours :) Thanks for posting.

Now I’m stumped, we have the same motherboard and yet I have barely over half the bandwidth. My RAM is also 800MHz CL5 only I use 2 sticks with 2GB total (which should actually be a tiny bit faster AFAIK).

My CPU is clocked at a stock 2.5 GHz (Intel E5200) and I’m using WinXP32 @ 177.92 drivers. Could a slower CPU make such a difference? Would you have bandwithTest reports from before you overclocked your CPU? (Or if you were so kind, could you bring it back to a stock clock for one boot and launch bandwdthTest)

Maybe you have your memory sticks in the same channel? When you have 4 slots for memory, you have to make sure you put the sticks in the right slots to get dual channel performance.

At stock 2.66GHz, I go down to 4.6 GB/s Device to Host. Interesting. Btw, I think we’ve established your 3.4 GB/s is from single-channel memory. Did you ever try swapping your ddr stick orientation?